Summary
I'm running a slim homelab (Kubernetes cluster with few pods, Gitea, Drone, Docker Registry, and NFS share) on a 2-Pi4 cluster, and the system performance is poor. I've noted that the controller node's filesystem looks pretty slow - I'm not sure if that is a cause of, or caused by, the other symptoms. I'm going to reimage and reinstall the controller node on a new SD card in the hopes that that fixes it - but, in the meantime, I'm looking for other approaches for debugging this issue.
Situation
I've set up a minimal Kubernetes cluster on my own hardware, mostly following this guide with a few changes:
- I only have two Pi4s (1 8Gb RAM, 1 4Gb), so my cluster is slightly smaller (8Gb is control plane, 4Gb is worker).
- After finding Ubuntu Server to be a bit slow and unresponsive (and validating that impression with other Pi-thusiasts to make sure it wasn't just my perception/hardware), I used the 64-bit Raspbian OS instead.
- Which, in turn, meant that my
cmdline.txt
change was slightly different - when I used the Ubuntu version from that article, the Pi did not come back up from a reboot
- The cluster isn't (yet!) on its own private network - they're just communicating via my main home network.
- The controller node has a hard drive connected via USB3, and shared via NFS for use by k8s pods.
- I also installed fail2ban, Gitea, Drone, and a rudimentary Docker Container Registry (as well as the aforementioned NFS share) on the controller node - I thought it was best to host the CI/CD and components independently of the k8s cluster because they are dependencies of it (happy to get feedback on that, but I think it's tangential to this question).
Problem
The cluster is up and running, and I've been able to run some deployments on it (Kubernetes Dashboard, jellyfin, grafana, and a small nginx-based deployment of my Hugo-built blog). This (along with the aforementioned CI/CD components and NFS share) seems like it should be a pretty insignificant load for the cluster (and I confirmed this expectation with the author of that article) - I've previously run all of those (minus the Kubernetes overhead) and more on the single 4Gb Pi4 alone, with no issues. However, the system is very slow and unresponsive:
- Simple shell commands (e.g.
man ln
, df
, uptime
) will take ~10 seconds to complete; apt-et install
or pip3 install
commands are much slower than usual (double-digit minutes)
- Loads of simple pages in Gitea's UI (e.g.) can take anywhere between 10 seconds and a minute.
- Simple builds the blog (Gitea link, or GitHub mirror if that's unavailable) take over 20 minutes.
- Creation of a simple pod can take double-digit minutes
- The Kubernetes Dashboard will often display a spinner icon for a pane/page for ~20 seconds before populating information.
- When using
kubectl proxy
to view the dashboard, sometimes instead of a page the browser will show a JSON payload including the message error trying to reach service: dial tcp <IP> connect: connection refused
. If I instead use kubectl port-forward -n kubernetes-dashboard service/kubernetes-dashboard 8443:443
, I get the following error in the terminal:
Forwarding from 127.0.0.1:8443 -> 8443
Forwarding from [::1]:8443 -> 8443
Handling connection for 8443
E0520 22:03:24.086226 47798 portforward.go:400] an error occurred forwarding 8443 -> 8443: error forwarding port 8443 to pod a8ef295e1e42c5c739f761ab517618dd1951ad0c19fb517849979edb80745763, uid : failed to execute portforward in network namespace "/var/run/netns/cni-cfc573de-3714-1f3a-59a9-96285ce328ca": read tcp4 127.0.0.1:45274->127.0.0.1:8443: read: connection reset by peer
Handling connection for 8443
Handling connection for 8443
E0520 22:03:29.884407 47798 portforward.go:385] error copying from local connection to remote stream: read tcp4 127.0.0.1:8443->127.0.0.1:54550: read: connection reset by peer
Handling connection for 8443
E0520 22:05:58.069799 47798 portforward.go:233] lost connection to pod
What I've tried so far
System Resources
First I checked the system resources on all k8s machines. htop
showed:
controller
- CPU load <10% across all 4 cores, memory usage at ~2G/7.6G, Swap 47/100M - `Load average 11.62 10.17 7.32
worker
- CPU load <3% across all 4 cores and memory usage at ~300M/1.81G, Swap 20/100M - Load average 0.00 0.00 0.00
Which is odd in two respects:
- if Load average is so high (this suggests that 100% utilization is "Load average = number of cores", so Load average of 11 indicates that this 4-core Pi is at nearly 300% capacity), why is CPU usage so low?
- Why is
worker
showing such low load average? In particular, I've confirmed that there is a ~50/50 split of k8s pods betwen controller
and worker
, and confirmed that I've set AGENTS_ENABLED=true
(ref) on the Drone server.
I followed the instructions here to investigate High System Load and Low CPU Utilization:
w
confirmed high system load
sar
output:
$ sar -u 5
Linux 5.15.32-v8+ (rassigma) 05/21/2022 _aarch64_ (4 CPU)
02:41:57 PM CPU %user %nice %system %iowait %steal %idle
02:42:02 PM all 2.47 0.00 1.16 96.37 0.00 0.00
02:42:07 PM all 2.77 0.00 2.21 95.02 0.00 0.00
02:42:12 PM all 3.97 0.00 1.01 95.02 0.00 0.00
02:42:17 PM all 2.42 0.00 1.11 96.47 0.00 0.00
^C
Average: all 2.91 0.00 1.37 95.72 0.00 0.00
So, a lot of %iowait!
ps -eo s,user | grep "^[RD]" | sort | uniq -c | sort -nbr
showed
6 D root
1 R pi
, so that doesn't seem like the cause here (the article lists an example with thousands of threads in D/R states)
Based on these two questions, I'll include here the output of various commands run on controller
, though I don't know how to interpret them:
$ netstat -i 15
Kernel Interface table
Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
br-5bde1 1500 15188 0 0 0 15765 0 0 0 BMRU
br-68f83 1500 121 0 0 0 241 0 0 0 BMU
cni0 1450 1546275 0 0 0 1687849 0 0 0 BMRU
docker0 1500 146703 0 0 0 160569 0 0 0 BMRU
eth0 1500 5002006 0 0 0 2325706 0 0 0 BMRU
flannel. 1450 161594 0 0 0 168478 0 4162 0 BMRU
lo 65536 6018581 0 0 0 6018581 0 0 0 LRU
veth1729 1450 41521 0 0 0 59590 0 0 0 BMRU
veth1a77 1450 410622 0 0 0 453044 0 0 0 BMRU
veth35a3 1450 82 0 0 0 20237 0 0 0 BMRU
veth3dce 1500 59212 0 0 0 61170 0 0 0 BMRU
veth401b 1500 28 0 0 0 4182 0 0 0 BMRU
veth4257 1450 108391 0 0 0 173055 0 0 0 BMRU
veth4642 1500 12629 0 0 0 16556 0 0 0 BMRU
veth6a62 1450 83 0 0 0 20285 0 0 0 BMRU
veth7c18 1450 47952 0 0 0 59756 0 0 0 BMRU
veth8a14 1450 82 0 0 0 20279 0 0 0 BMRU
vethcc5c 1450 655457 0 0 0 716329 0 0 0 BMRU
vethe535 1450 17 0 0 0 769 0 0 0 BMRU
vethf324 1450 180986 0 0 0 198679 0 0 0 BMRU
wlan0 1500 0 0 0 0 0 0 0 0 BMU
$ iostat -d -x
Linux 5.15.32-v8+ (rassigma) 05/21/2022 _aarch64_ (4 CPU)
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
mmcblk0 0.20 14.65 0.07 26.90 1031.31 74.40 3.33 56.68 1.64 33.04 4562.85 17.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 15.40 51.07
sda 0.27 28.31 0.05 15.37 25.75 104.42 0.36 26.56 0.24 39.99 64.19 72.81 0.00 0.00 0.00 0.00 0.00 0.00 0.04 90.24 0.03 0.56
$ vmstat 15
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
8 3 48640 827280 129600 4607164 0 0 11 21 15 42 4 1 71 24 0
0 5 48640 827128 129600 4607216 0 0 1 44 2213 4267 4 1 31 64 0
0 10 48640 827660 129600 4607216 0 0 0 47 1960 3734 4 1 36 59 0
0 5 48640 824912 129600 4607624 0 0 1 121 2615 4912 6 2 15 77 0
2 12 48640 824416 129600 4607692 0 0 0 102 2145 4129 4 2 30 64 0
1 7 48640 822428 129600 4607972 0 0 3 81 1948 3564 6 2 10 83 0
0 5 48640 823312 129600 4608164 0 0 4 62 2328 4273 5 2 12 81 0
0 7 48640 824320 129600 4608220 0 0 1 143 2433 4695 5 2 9 84 0
...
51% utilization on the SD card (from iostat
output) is probably reasonably high, but not problematically-so I would have thought?
Filesystem
Referencing this article on how to test (SD card) Filesystem performance on controller
and worker
(both are using SD cards from the same batch, which advertized 10 MB/s write speed):
controller - $ dd if=/dev/zero of=speedtest bs=1M count=100 conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 43.2033 s, 2.4 MB/s
worker - $ dd if=/dev/zero of=speedtest bs=1M count=100 conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 5.97128 s, 17.6 MB/s
controller
's FS write appears to be ~7 times slower than worker
's. I'm not sure how to causally interpret that, though - it could be that the controller
's filesystem is slow which is causing the other symptoms, or it could be that there is some other process-throughput-bottleneck which is causing both the slow filesystem and the other symptoms.
Network
My home network is behind a fairly standard OPNSense router.
Checking external network connectivity with Speedtest CLI:
controller - $ speedtest
Server: Tekify Fiber & Wireless - Fremont, CA (id = 6468)
ISP: Sonic.net, LLC
Latency: 3.53 ms (0.15 ms jitter)
Download: 859.90 Mbps (data used: 523.3 MB )
Upload: 932.58 Mbps (data used: 955.5 MB )
Packet Loss: 0.0%
---
worker - $ speedtest
Server: Tekify Fiber & Wireless - Fremont, CA (id = 6468)
ISP: Sonic.net, LLC
Latency: 3.29 ms (1.84 ms jitter)
Download: 871.33 Mbps (data used: 776.6 MB )
Upload: 917.25 Mbps (data used: 630.5 MB )
Packet Loss: 0.0%
I did plan to test intra-network speed, but given how long it took to get to this point of debugging and the strong signals that there's an issue with controller
's SD card (high %iowait
, slow dd
write performance), I elected to move on to replacing that first before checking network.
Updates
- After re-imaging on a fresh SD card, with absolutely nothing else installed on it other than Raspbian, the
dd if=/dev/zero of=speedtest bs=1M count=100 conv=fdatasync
Filesystem-speed test gives 17.2 MB/s for the "reborn" controller node. I'll install the k8s cluster and other tools, and test again.
- After installing all the tools (k8s, docker container, drone, gitea, nfs), the filesystem write-speed was 17MB/s; after installing the containers on the k8s cluster, the write-speed was 16.5MB, and
%iowait
from sar -u 5
was nearly 0. System performance is great! Looks like it was just a dud SD card :D