I'm trying to do a clean install of Kubernetes 1.23.x on a cluster of four Raspberry Pis, each running the x64 version of Raspberry Pi OS, however I am running into a major snag as soon as I try and run kubeadm init
on the master node (before even attempting to get the other nodes to join). Namely: just five minutes after calling kubeadm init
on the master node, the cluster stops working. In fact, it never really works to begin with. At first the server responds saying the node is NotReady, but then after 5 minutes it stops responding altogether.
So here's what I did, and what I saw: I installed containerd and kubeadm. Then I run the following command on the master node to try and start Kubernetes:
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 \
--token-ttl=0 --apiserver-advertise-address=192.168.1.194
After running that command, and subsequently copying the /etc/kubernetes/admin.conf file to ~/.kube/config, I am able to run the following command:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master-1 NotReady control-plane,master 3m36s v1.23.4
And it will continue to show a NotReady status for about 5 minutes, after which point the same command yields a very different result:
$ kubectl get nodes
The connection to the server 192.168.1.194:6443 was refused - did you specify the right host or port?
I'm not sure why this is happening, but it is very consistent. I have tried a few times now to kubeadm reset
and then kubeadm init
again, and the connection timeout always happens at the 5-minute mark. So the last time I tried to do this, I decided to tail all the log files under /var/log/containers/
. After the 5-minute mark, it is repeatedly logging some variation of a connection error to 127.0.0.1:2379. For example:
2022-03-09T19:30:29.307156643-06:00 stderr F W0310 01:30:29.306871 1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 0 }. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
From Googling, it appears that etcd is running on that port, but then at the 5-minute mark, a bunch of services (including etcd) start shutting down. I've uploaded the full logs from the time kubeadm init
runs, up until prior to the dreaded 5-minute mark, as a Gist.
I have already checked that all the ports are open, too. (They are.) During those first five minutes, I can telnet to local port 2379. Why won't Kubernetes start on my Pi? What am I missing?
UPDATE: As requested, I can provide a few more details. I saw a post recommending setting the value of --apiserver-advertise-address
to 0.0.0.0
instead of the direct IP, so I tried that but it seemed to make no difference. I tried running systemctl status kubelet
which shows that the kubelet service is "active" during that initial 5 minute period.
I also ran kubectl describe node k8s-master-1
, which shows four events in this sequence:
- KubeletHasSufficientMemory
- KubeletHasNoDiskPressure
- KubeletHasSufficientPID
- KubeletNotReady
That last event is accompanied by this message: "container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized." So that got me thinking. I had been waiting for the Node to come up as Ready before installing Flannel (aka the CNI plugin), but this time I decided to try installing Flannel during that initial 5 minute period, using this command:
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
And to my great surprise, that worked! Well, sort of. The master node did eventually start reporting a "Ready" status. And I noticed that all my pods came up with the notable exception of the coredns pods. However, after a short while, the kube-proxy pod (in the kube-system namespace) dies and gets stuck in a CrashLoopBackoff, and then later still the kube-controller-manager and kube-scheduler pods similarly enter a CrashLoopBackoff. Then, this time, after about 15 minutes, the whole cluster died again as before (meaning I got the same 'connection to the server was refused' message). So I feel like I'm a little bit closer, but also still a long ways away from getting this working.
SECOND UPDATE: A couple of things: it seems that when I install the flannel CNI plugin, coredns is either not included or doesn't work. But when I install weave works CNI then it at least tries to spin up coredns, although unfortunately those pods get stuck in ContainerCreating and never actually activate. So as requested, I am providing a number of additional logs. They're long enough to warrant uploading them separately so here's a link to a Gist containing four logs:
- Running
kubectl -n kube-system logs pod/coredns-...
- Running
kubectl -n kube-system logs pod/kube-controller-manager-k8s-master-1
- Running
kubectl -n kube-system logs pod/kube-proxy-...
- Running
kubectl describe node k8s-master-1
Note that before everything dies, the kube-controller-manager-... pod starts up but soon finds itself in a CrashLoopBackoff. While the coredns pods never start up successfully at all.