System performance of Rasperry Pi (running k8s) is very poor - how to debug?

Question

Score:0

Server

System performance of Rasperry Pi (running k8s) is very poor - how to debug?

scubbo

9/21/23, 10:31 PM

Summary

I'm running a slim homelab (Kubernetes cluster with few pods, Gitea, Drone, Docker Registry, and NFS share) on a 2-Pi4 cluster, and the system performance is poor. I've noted that the controller node's filesystem looks pretty slow - I'm not sure if that is a cause of, or caused by, the other symptoms. I'm going to reimage and reinstall the controller node on a new SD card in the hopes that that fixes it - but, in the meantime, I'm looking for other approaches for debugging this issue.

Situation

I've set up a minimal Kubernetes cluster on my own hardware, mostly following this guide with a few changes:

I only have two Pi4s (1 8Gb RAM, 1 4Gb), so my cluster is slightly smaller (8Gb is control plane, 4Gb is worker).
After finding Ubuntu Server to be a bit slow and unresponsive (and validating that impression with other Pi-thusiasts to make sure it wasn't just my perception/hardware), I used the 64-bit Raspbian OS instead.
- Which, in turn, meant that my cmdline.txt change was slightly different - when I used the Ubuntu version from that article, the Pi did not come back up from a reboot
The cluster isn't (yet!) on its own private network - they're just communicating via my main home network.
The controller node has a hard drive connected via USB3, and shared via NFS for use by k8s pods.
I also installed fail2ban, Gitea, Drone, and a rudimentary Docker Container Registry (as well as the aforementioned NFS share) on the controller node - I thought it was best to host the CI/CD and components independently of the k8s cluster because they are dependencies of it (happy to get feedback on that, but I think it's tangential to this question).

Problem

The cluster is up and running, and I've been able to run some deployments on it (Kubernetes Dashboard, jellyfin, grafana, and a small nginx-based deployment of my Hugo-built blog). This (along with the aforementioned CI/CD components and NFS share) seems like it should be a pretty insignificant load for the cluster (and I confirmed this expectation with the author of that article) - I've previously run all of those (minus the Kubernetes overhead) and more on the single 4Gb Pi4 alone, with no issues. However, the system is very slow and unresponsive:

Simple shell commands (e.g. man ln , df, uptime) will take ~10 seconds to complete; apt-et install or pip3 install commands are much slower than usual (double-digit minutes)
Loads of simple pages in Gitea's UI (e.g.) can take anywhere between 10 seconds and a minute.
Simple builds the blog (Gitea link, or GitHub mirror if that's unavailable) take over 20 minutes.
Creation of a simple pod can take double-digit minutes
The Kubernetes Dashboard will often display a spinner icon for a pane/page for ~20 seconds before populating information.
When using kubectl proxy to view the dashboard, sometimes instead of a page the browser will show a JSON payload including the message error trying to reach service: dial tcp <IP> connect: connection refused. If I instead use kubectl port-forward -n kubernetes-dashboard service/kubernetes-dashboard 8443:443, I get the following error in the terminal:

Forwarding from 127.0.0.1:8443 -> 8443
Forwarding from [::1]:8443 -> 8443
Handling connection for 8443
E0520 22:03:24.086226   47798 portforward.go:400] an error occurred forwarding 8443 -> 8443: error forwarding port 8443 to pod a8ef295e1e42c5c739f761ab517618dd1951ad0c19fb517849979edb80745763, uid : failed to execute portforward in network namespace "/var/run/netns/cni-cfc573de-3714-1f3a-59a9-96285ce328ca": read tcp4 127.0.0.1:45274->127.0.0.1:8443: read: connection reset by peer
Handling connection for 8443
Handling connection for 8443
E0520 22:03:29.884407   47798 portforward.go:385] error copying from local connection to remote stream: read tcp4 127.0.0.1:8443->127.0.0.1:54550: read: connection reset by peer
Handling connection for 8443
E0520 22:05:58.069799   47798 portforward.go:233] lost connection to pod

What I've tried so far

System Resources

First I checked the system resources on all k8s machines. htop showed:

controller - CPU load <10% across all 4 cores, memory usage at ~2G/7.6G, Swap 47/100M - `Load average 11.62 10.17 7.32
worker - CPU load <3% across all 4 cores and memory usage at ~300M/1.81G, Swap 20/100M - Load average 0.00 0.00 0.00

Which is odd in two respects:

if Load average is so high (this suggests that 100% utilization is "Load average = number of cores", so Load average of 11 indicates that this 4-core Pi is at nearly 300% capacity), why is CPU usage so low?
Why is worker showing such low load average? In particular, I've confirmed that there is a ~50/50 split of k8s pods betwen controller and worker, and confirmed that I've set AGENTS_ENABLED=true (ref) on the Drone server.

I followed the instructions here to investigate High System Load and Low CPU Utilization:

w confirmed high system load
sar output:

$ sar -u 5
Linux 5.15.32-v8+ (rassigma)    05/21/2022      _aarch64_       (4 CPU)

02:41:57 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
02:42:02 PM     all      2.47      0.00      1.16     96.37      0.00      0.00
02:42:07 PM     all      2.77      0.00      2.21     95.02      0.00      0.00
02:42:12 PM     all      3.97      0.00      1.01     95.02      0.00      0.00
02:42:17 PM     all      2.42      0.00      1.11     96.47      0.00      0.00
^C
Average:        all      2.91      0.00      1.37     95.72      0.00      0.00

So, a lot of %iowait!

ps -eo s,user | grep "^[RD]" | sort | uniq -c | sort -nbr showed

6 D root
1 R pi

, so that doesn't seem like the cause here (the article lists an example with thousands of threads in D/R states)

Based on these two questions, I'll include here the output of various commands run on controller, though I don't know how to interpret them:

$ netstat -i 15
Kernel Interface table
Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
br-5bde1  1500    15188      0      0 0         15765      0      0      0 BMRU
br-68f83  1500      121      0      0 0           241      0      0      0 BMU
cni0      1450  1546275      0      0 0       1687849      0      0      0 BMRU
docker0   1500   146703      0      0 0        160569      0      0      0 BMRU
eth0      1500  5002006      0      0 0       2325706      0      0      0 BMRU
flannel.  1450   161594      0      0 0        168478      0   4162      0 BMRU
lo       65536  6018581      0      0 0       6018581      0      0      0 LRU
veth1729  1450    41521      0      0 0         59590      0      0      0 BMRU
veth1a77  1450   410622      0      0 0        453044      0      0      0 BMRU
veth35a3  1450       82      0      0 0         20237      0      0      0 BMRU
veth3dce  1500    59212      0      0 0         61170      0      0      0 BMRU
veth401b  1500       28      0      0 0          4182      0      0      0 BMRU
veth4257  1450   108391      0      0 0        173055      0      0      0 BMRU
veth4642  1500    12629      0      0 0         16556      0      0      0 BMRU
veth6a62  1450       83      0      0 0         20285      0      0      0 BMRU
veth7c18  1450    47952      0      0 0         59756      0      0      0 BMRU
veth8a14  1450       82      0      0 0         20279      0      0      0 BMRU
vethcc5c  1450   655457      0      0 0        716329      0      0      0 BMRU
vethe535  1450       17      0      0 0           769      0      0      0 BMRU
vethf324  1450   180986      0      0 0        198679      0      0      0 BMRU
wlan0     1500        0      0      0 0             0      0      0      0 BMU

$ iostat -d -x
Linux 5.15.32-v8+ (rassigma)    05/21/2022      _aarch64_       (4 CPU)

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
mmcblk0          0.20     14.65     0.07  26.90 1031.31    74.40    3.33     56.68     1.64  33.04 4562.85    17.02    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00   15.40  51.07
sda              0.27     28.31     0.05  15.37   25.75   104.42    0.36     26.56     0.24  39.99   64.19    72.81    0.00      0.00     0.00   0.00    0.00     0.00    0.04   90.24    0.03   0.56

$ vmstat 15
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 8  3  48640 827280 129600 4607164    0    0    11    21   15   42  4  1 71 24  0
 0  5  48640 827128 129600 4607216    0    0     1    44 2213 4267  4  1 31 64  0
 0 10  48640 827660 129600 4607216    0    0     0    47 1960 3734  4  1 36 59  0
 0  5  48640 824912 129600 4607624    0    0     1   121 2615 4912  6  2 15 77  0
 2 12  48640 824416 129600 4607692    0    0     0   102 2145 4129  4  2 30 64  0
 1  7  48640 822428 129600 4607972    0    0     3    81 1948 3564  6  2 10 83  0
 0  5  48640 823312 129600 4608164    0    0     4    62 2328 4273  5  2 12 81  0
 0  7  48640 824320 129600 4608220    0    0     1   143 2433 4695  5  2  9 84  0
 ...

51% utilization on the SD card (from iostat output) is probably reasonably high, but not problematically-so I would have thought?

Filesystem

Referencing this article on how to test (SD card) Filesystem performance on controller and worker (both are using SD cards from the same batch, which advertized 10 MB/s write speed):

controller - $ dd if=/dev/zero of=speedtest bs=1M count=100 conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 43.2033 s, 2.4 MB/s

worker - $ dd if=/dev/zero of=speedtest bs=1M count=100 conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 5.97128 s, 17.6 MB/s

controller's FS write appears to be ~7 times slower than worker's. I'm not sure how to causally interpret that, though - it could be that the controller's filesystem is slow which is causing the other symptoms, or it could be that there is some other process-throughput-bottleneck which is causing both the slow filesystem and the other symptoms.

Network

My home network is behind a fairly standard OPNSense router.

Checking external network connectivity with Speedtest CLI:

controller - $ speedtest
     Server: Tekify Fiber & Wireless - Fremont, CA (id = 6468)
        ISP: Sonic.net, LLC
    Latency:     3.53 ms   (0.15 ms jitter)
   Download:   859.90 Mbps (data used: 523.3 MB )
     Upload:   932.58 Mbps (data used: 955.5 MB )
Packet Loss:     0.0%
---
worker - $ speedtest
     Server: Tekify Fiber & Wireless - Fremont, CA (id = 6468)
        ISP: Sonic.net, LLC
    Latency:     3.29 ms   (1.84 ms jitter)
   Download:   871.33 Mbps (data used: 776.6 MB )
     Upload:   917.25 Mbps (data used: 630.5 MB )
Packet Loss:     0.0%

I did plan to test intra-network speed, but given how long it took to get to this point of debugging and the strong signals that there's an issue with controller's SD card (high %iowait, slow dd write performance), I elected to move on to replacing that first before checking network.

Updates

After re-imaging on a fresh SD card, with absolutely nothing else installed on it other than Raspbian, the dd if=/dev/zero of=speedtest bs=1M count=100 conv=fdatasync Filesystem-speed test gives 17.2 MB/s for the "reborn" controller node. I'll install the k8s cluster and other tools, and test again.
After installing all the tools (k8s, docker container, drone, gitea, nfs), the filesystem write-speed was 17MB/s; after installing the containers on the k8s cluster, the write-speed was 16.5MB, and %iowait from sar -u 5 was nearly 0. System performance is great! Looks like it was just a dud SD card :D

39

0 + 0

iostat

kubernetes

raspbian

System performance of Rasperry Pi (running k8s) is very poor - how to debug?

Summary

Situation

Problem

What I've tried so far

System Resources

Filesystem

Network

Updates

Post an answer