Score:0

microk8s kubelet reboot loop

it flag

I have a k8 cluster using microk8s, and the cluster started to misbehave, unresponsive / providing stale states when scheduling pods, managing nodes, etc.

I have gone through and rebooted all manager nodes, and removed all worker nodes, to reduce the noise, and in the hopes of potentially evicting a problem node (as I was seeing long response times / timeouts in the /var/logs/syslogs.

On restart, if I look at the kubectl get events I see the kubelet service being constantly restarted:

14m         Normal    Starting                  node/node1   Starting kubelet.
14m         Warning   InvalidDiskCapacity       node/node1   invalid capacity 0 on image filesystem
14m         Normal    Starting                  node/node3   Starting kubelet.
14m         Normal    NodeHasSufficientMemory   node/node1   Node node1 status is now: NodeHasSufficientMemory
14m         Normal    NodeHasNoDiskPressure     node/node1   Node node1 status is now: NodeHasNoDiskPressure
14m         Normal    Starting                  node/node3
14m         Normal    NodeHasSufficientPID      node/node1   Node node1 status is now: NodeHasSufficientPID
14m         Warning   InvalidDiskCapacity       node/node3   invalid capacity 0 on image filesystem
14m         Normal    NodeHasSufficientMemory   node/node3   Node node3 status is now: NodeHasSufficientMemory
14m         Normal    NodeAllocatableEnforced   node/node1   Updated Node Allocatable limit across pods
14m         Normal    NodeHasNoDiskPressure     node/node3   Node node3 status is now: NodeHasNoDiskPressure
14m         Normal    NodeHasSufficientPID      node/node3   Node node3 status is now: NodeHasSufficientPID
14m         Normal    NodeAllocatableEnforced   node/node3   Updated Node Allocatable limit across pods
11m         Normal    Starting                  node/node1
11m         Normal    Starting                  node/node1   Starting kubelet.
11m         Warning   InvalidDiskCapacity       node/node1   invalid capacity 0 on image filesystem
11m         Normal    NodeHasSufficientMemory   node/node1   Node node1 status is now: NodeHasSufficientMemory
11m         Normal    NodeHasNoDiskPressure     node/node1   Node node1 status is now: NodeHasNoDiskPressure
11m         Normal    NodeHasSufficientPID      node/node1   Node node1 status is now: NodeHasSufficientPID
11m         Normal    NodeAllocatableEnforced   node/node1   Updated Node Allocatable limit across pods
10m         Normal    Starting                  node/node3   Starting kubelet.
10m         Warning   InvalidDiskCapacity       node/node3   invalid capacity 0 on image filesystem
10m         Normal    NodeAllocatableEnforced   node/node3   Updated Node Allocatable limit across pods
10m         Normal    NodeHasSufficientMemory   node/node3   Node node3 status is now: NodeHasSufficientMemory
10m         Normal    NodeHasNoDiskPressure     node/node3   Node node3 status is now: NodeHasNoDiskPressure
10m         Normal    NodeHasSufficientPID      node/node3   Node node3 status is now: NodeHasSufficientPID
8m1s        Normal    Starting                  node/node1
7m57s       Normal    Starting                  node/node1   Starting kubelet.
7m57s       Warning   InvalidDiskCapacity       node/node1   invalid capacity 0 on image filesystem
7m57s       Normal    NodeHasSufficientMemory   node/node1   Node node1 status is now: NodeHasSufficientMemory
7m9s        Normal    Starting                  node/node3   Starting kubelet.
7m57s       Normal    NodeHasNoDiskPressure     node/node1   Node node1 status is now: NodeHasNoDiskPressure
7m8s        Normal    Starting                  node/node3
7m57s       Normal    NodeHasSufficientPID      node/node1   Node node1 status is now: NodeHasSufficientPID
7m9s        Warning   InvalidDiskCapacity       node/node3   invalid capacity 0 on image filesystem
7m8s        Normal    NodeHasSufficientMemory   node/node3   Node node3 status is now: NodeHasSufficientMemory
7m57s       Normal    NodeAllocatableEnforced   node/node1   Updated Node Allocatable limit across pods
7m8s        Normal    NodeHasNoDiskPressure     node/node3   Node node3 status is now: NodeHasNoDiskPressure
7m8s        Normal    NodeHasSufficientPID      node/node3   Node node3 status is now: NodeHasSufficientPID
7m8s        Normal    NodeAllocatableEnforced   node/node3   Updated Node Allocatable limit across pods

I'm not sure where to find the logs that will give me the reasons for these restarts, and googling the invalid capacity 0 warning, people seem to say it can be ignored. Also it's a warning and not an error, nor is it the final log before a restart, so I would assume it isn't preventing the startup. Though, I'm not sure.

I'm looking for more logs that will give me more details into why this service is failing on the nodes. I've looked in journalctl -f -u snap.microk8s.daemon-kubelite as the docs say the kubelet logs have been consolidated into there.

I can't include the logs as it's past the char limit on ServerFault, and you can't seem to upload files. I can maybe provide snippets, or search for specific things.

Here's some things that stand out, but dunno if any of them would lead to the kubelet not starting:

...
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.025847  459922 server.go:1251] "Started kubelet"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.025894  459922 server.go:177] "Starting to listen read-only" address="0.0.0.0" port=10255
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.025926  459922 server.go:150] "Starting to listen" address="0.0.0.0" port=10250
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.026588  459922 server.go:410] "Adding debug handlers to kubelet server"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.027415  459922 fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.027521  459922 volume_manager.go:294] "Starting Kubelet Volume Manager"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.027975  459922 desired_state_of_world_populator.go:151] "Desired state populator starts to run"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: E0818 12:25:46.028881  459922 cri_stats_provider.go:455] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: E0818 12:25:46.029007  459922 kubelet.go:1351] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.061667  459922 kubelet_network_linux.go:57] "Initialized protocol iptables rules." protocol=IPv4
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.072440  459922 kubelet_network_linux.go:57] "Initialized protocol iptables rules." protocol=IPv6
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.072452  459922 status_manager.go:161] "Starting to sync pod status with apiserver"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.072461  459922 kubelet.go:2031] "Starting kubelet main sync loop"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: E0818 12:25:46.072492  459922 kubelet.go:2055] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.122991  459922 cpu_manager.go:213] "Starting CPU manager" policy="none"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.123003  459922 cpu_manager.go:214] "Reconciling" reconcilePeriod="10s"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.123016  459922 state_mem.go:36] "Initialized new in-memory state store"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.123144  459922 state_mem.go:88] "Updated default CPUSet" cpuSet=""
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.123156  459922 state_mem.go:96] "Updated CPUSet assignments" assignments=map[]
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.123162  459922 policy_none.go:49] "None policy: Start"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.124388  459922 memory_manager.go:168] "Starting memorymanager" policy="None"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.124403  459922 state_mem.go:35] "Initializing new in-memory state store"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.124496  459922 state_mem.go:75] "Updated machine memory state"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.125518  459922 manager.go:611] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.125689  459922 plugin_manager.go:114] "Starting Kubelet Plugin Manager"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.126270  459922 csi_plugin.go:99] kubernetes.io/csi: Trying to validate a new CSI Driver with name: cstor.csi.openebs.io endpoint: /var/snap/microk8s/common/var/lib/kubelet/plugins/cstor.csi.openebs.io/csi.sock versions: 1.0.0
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.126286  459922 csi_plugin.go:112] kubernetes.io/csi: Register new plugin with name: cstor.csi.openebs.io at endpoint: /var/snap/microk8s/common/var/lib/kubelet/plugins/cstor.csi.openebs.io/csi.sock
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.128371  459922 kubelet_node_status.go:70] "Attempting to register node" node="node3"
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.159655  459922 serving.go:348] Generated self-signed cert in-memory
Aug 18 12:25:46 node3 microk8s.daemon-kubelite[459922]: I0818 12:25:46.173125  459922 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="7d5dfde734a4022cf57816fb8fd2bfd9b30dc4c44fefb262d866854e9905fedd"
...

Would appreciate any help in finding the error that matters here.

Thanks.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.