Score:1

Pods on a k8s node are unaccessible, kube-proxy or CNI failed

sv flag

I have add a new node to my k8s cluster, but I found some allocated to this node cannot show logs like this:

$ kubectl logs -n xxxx xxxxx-6d5bdd7d6f-5ps6k

Unable to connect to the server: EOF

Using Lens gives error logs like this:

Failed to load logs: request to http://127.0.0.1:49271/api-kube/api/v1/namespaces/xxxxxxx/pods/xxxx34-27736483--1-hxjpv/log?tailLines=500&timestamps=true&container=xxxxxx&previous=false failed, reason: socket hang up
Reason: undefined (ECONNRESET)

I believe there's some problem in this node, when I use port-forwarding:

$ kubectl port-forward -n argocd svc/argocd-notifications-controller-metrics 9001:9001
error: error upgrading connection: error dialing backend: dial tcp 10.0.6.20:10250: i/o timeout

I think the internal IP 10.0.6.20 is wrong.

All kube-proxy pods shows running from kubectl

-> % kgp -o wide -n kube-system | grep kube-proxy
kube-proxy-7pg9d                                  1/1     Running     1 (2d20h ago)   29d     10.0.6.20        worker4     
kube-proxy-cqh2c                                  1/1     Running     1 (15d ago)     29d     10.0.6.3         worker3           
kube-proxy-lp4cd                                  1/1     Running     0               29d     10.0.6.1         worker1           
kube-proxy-r6bgw                                  1/1     Running     0               29d     10.0.6.2         worker2

But using crictl pods on each node looking for these pods

# crictl pods | grep kube-proxy
ceef94b060e56       2 days ago          Ready               kube-proxy-7pg9d                                   kube-system         1                   (default)
418bd5b46c2b9       4 weeks ago         NotReady            kube-proxy-7pg9d                                   kube-system         0                   (default)

Shows Ready or NotReady I am using Calico for CNI, in ip_vs mode. How can I fix this?

SYN avatar
hk flag
SYN
There's nothing wrong with those NotReady pods, as long as you have another one that is Ready, started afterwards. What makes you think `10.0.6.20` is wrong, what's your node IP? Any chance your DNS would resolve "worker4" to wrong IP? Regardless, connection timing out to port 10250 (kubelet) indeed suggests there's no one listening on that IP. I would ssh to that node, check ip configuration,, routes, compare with a working node.
SYN avatar
hk flag
SYN
from crictl perspective, a pod that is NotReady may just be some leftover from a previous kubernetes pod. If you run `crictl ps`, you would see running containers, which would belong to ready pods. while `crictl ps -a` would show you exited containers, usually belonging to unready pods. usually, unready crictl pods could be removed, manually or using scripts, when kubelet leaves those behind.
Andy Huang avatar
sv flag
@SYN, because I do port-forwarding from my computer, which should not in the subnet where `10.0.6.20` is reachable, the node ip should be an actual IP of worker4
SYN avatar
hk flag
SYN
"I do port-forwarding" : please explain what do you mean here. "which should not be in the subnet where 10.0.6.20 is reachable" ; makes no sense, you realize your other workers are in the same /26? "should be an actual IP of worker4" : again, what's that node IP, then? Have you checked ip & routing configuration on that node? What about dns resolution on the other node (specifically: the api)
SYN avatar
hk flag
SYN
Keeping in mind the "IP" address showing on your pods is usually set by the kubelet instance running on your node, based on actual values at the time of starting the pod. Should we guess that you're using DHCP assigning addresses to your nodes, and the lease for worker4 was somehow renewed? Feel free to add details to your original post
Andy Huang avatar
sv flag
@SYN I use port-forwearding from kubectl from my local computer `kubectl port-forward -n argocd svc/argocd-notifications-controller-metrics 9001:9001` shows this error: `error: error upgrading connection: error dialing backend: dial tcp 10.0.6.20:10250: i/o timeout` 10.0.6.20 is impossible to reach from my local computer. I am not sure how to check routing, configurations, DNS among nodes.
SYN avatar
hk flag
SYN
We would use kubectl port-forward connecting SDN addresses/services. Here, we're talking about the IP of your node (unrelated to your SDN). You should have some kind of access over there.
Andy Huang avatar
sv flag
@SYN, I can ssh to all nodes
SYN avatar
hk flag
SYN
... so what is your worker4 node IP? if 10.0.6.20: check kubelet logs, as we could see there's no answer from here. Since timeout (and not refused), I suspect you have some DHCP, node changed addresses, and somehow API doesn't yet know about it.
Andy Huang avatar
sv flag
@SYN worker4 IP is `213.108.105.12`, the kubelet service is run by systemd, I used `journalctl -xeu kubelet` to find logs, but mostly logs are about specific container status, what specific key I should note?
Andy Huang avatar
sv flag
I found iptables on worker4 is dropping all packets from kubeapiserver
SYN avatar
hk flag
SYN
any clue why?! lacking anything better to suggest: would a reboot help, maybe?
Score:1
sv flag

I solved this problem following these procedure:

Worker4

Make sure kubelet is listening on default port:

# lsof -i:10250
COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
kubelet 819 root   26u  IPv4  13966      0t0  TCP worker4.cluster.local:10250 (LISTEN)

Worker1 curl https://10.0.6.20:10250 Gets timeout But found curl https://10.0.6.1:10250 # worker1 from worker4 responded quick.

So this might be packets is dropped inside worker4,

This logs packet in worker4: https://www.thegeekstuff.com/2012/08/iptables-log-packets/

iptables -N LOGGING
iptables -A INPUT -j LOGGING
iptables -A LOGGING -m limit --limit 2/min -j LOG --log-prefix "IPTables-Dropped: " --log-level 4
iptables -A LOGGING -j DROP

Will save logs to /var/log/syslog

Using command to filter from logs:

tail -200 /var/log/syslog | grep IPTables-Dropped | grep 10.0.6.1
Oct 10 13:49:37 compute kernel: [637626.880648] IPTables-Dropped: IN=eth1 OUT= MAC=00:16:ce:d4:b7:01:00:16:b2:77:89:01:08:00 SRC=10.0.6.1 DST=10.0.6.20 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=29087 DF PROTO=TCP SPT=58838 DPT=10250 WINDOW=64240 RES=0x00 SYN URGP=0

So I am convinced the packet is dropped.

Adding rules:

iptables -I INPUT -s 10.0.0.0/8 -p tcp --dport 10250 -j ACCEPT

Then I can attach shell or get logs from pods on the node. I appreciate discussions with @SYN

SYN avatar
hk flag
SYN
it's weird that port would not be reachable, it is indeed critical for kubernetes operations. kube-proxy and sdn components may setup rules on your nodes, but it should not interfere with this. If rebooting did not help, it could be worth investigating: where did that come from, how come you don't have it on your other nodes, ... still, nice catch. pretty weird.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.