Score:0

Cannot reach pod from pod in some machines but tunnel in node is reached

ma flag

I have a pod with a cluster IP of 10.233.70.35 in a bare metal Kubernetes 1.19 cluster with Calico 3.16.9 as CNI. Let's call this Pod A. In most nodes (which is different from the node of Pod A), a pod (Pod B) in the same Kubernetes namespace can reach Pod A as shown in the pcap on the node where Pod A is below:

# tcpdump -vv -i calib33bd7211a6|grep 10.233.109.62
tcpdump: listening on calib33bd7211a6, link-type EN10MB (Ethernet), capture size 262144 bytes
    10.233.109.62.60372 > 10.233.70.35.tproxy: Flags [S], cksum 0x16af (correct), seq 2138999970, win 64240, options [mss 1460,sackOK,TS val 2089146656 ecr 0,nop,wscale 7], length 0
    10.233.70.35.tproxy > 10.233.109.62.60372: Flags [S.], cksum 0xc961 (incorrect -> 0x579e), seq 3985188010, ack 2138999971, win 65160, options [mss 1460,sackOK,TS val 4061902615 ecr 2089146656,nop,wscale 7], length 0
    10.233.109.62.60372 > 10.233.70.35.tproxy: Flags [.], cksum 0x82fd (correct), seq 1, ack 1, win 502, options [nop,nop,TS val 2089146656 ecr 4061902615], length 0
# tcpdump -vv -i tunl0|grep 10.233.109.62
tcpdump: listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes
    10.233.109.62.34294 > 10.233.70.35.tproxy: Flags [S], cksum 0xbd5b (correct), seq 1964000002, win 64240, options [mss 1460,sackOK,TS val 1018637359 ecr 0,nop,wscale 7], length 0
    10.233.70.35.tproxy > 10.233.109.62.34294: Flags [S.], cksum 0xc961 (incorrect -> 0x7b0b), seq 1667300057, ack 1964000003, win 65160, options [mss 1460,sackOK,TS val 4061982287 ecr 1018637359,nop,wscale 7], length 0
    10.233.109.62.34294 > 10.233.70.35.tproxy: Flags [.], cksum 0xa66a (correct), seq 1, ack 1, win 502, options [nop,nop,TS val 1018637359 ecr 4061982287], length 0
    10.233.109.62.34294 > 10.233.70.35.tproxy: Flags [F.], cksum 0x592f (correct), seq 1, ack 1, win 502, options [nop,nop,TS val 1018657129 ecr 4061982287], length 0
    10.233.70.35.tproxy > 10.233.109.62.34294: Flags [F.], cksum 0xc959 (incorrect -> 0x0bec), seq 1, ack 2, win 510, options [nop,nop,TS val 4062002057 ecr 1018657129], length 0
    10.233.109.62.34294 > 10.233.70.35.tproxy: Flags [.], cksum 0x0bf3 (correct), seq 2, ack 2, win 502, options [nop,nop,TS val 1018657130 ecr 4062002057], length 0

However, in some machines (which is again different from the node of Pod A), a pod (Pod C) in the same k8s namespace cannot reach Pod A although it is able to reach the the tunnel of Pod A's node as shown below:

# tcpdump -vv -i calib33bd7211a6|grep 10.233.82.51
tcpdump: listening on calib33bd7211a6, link-type EN10MB (Ethernet), capture size 262144 bytes
# tcpdump -vv -i tunl0|grep 10.233.82.51
tcpdump: listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes
    10.233.82.51.35038 > 10.233.70.35.tproxy: Flags [S], cksum 0xc924 (correct), seq 2532090843, win 64240, options [mss 1460,sackOK,TS val 3899329055 ecr 0,nop,wscale 7], length 0
    10.233.82.51.35038 > 10.233.70.35.tproxy: Flags [S], cksum 0xc529 (correct), seq 2532090843, win 64240, options [mss 1460,sackOK,TS val 3899330074 ecr 0,nop,wscale 7], length 0
    10.233.82.51.35038 > 10.233.70.35.tproxy: Flags [S], cksum 0xbd49 (correct), seq 2532090843, win 64240, options [mss 1460,sackOK,TS val 3899332090 ecr 0,nop,wscale 7], length 0
    10.233.82.51.35038 > 10.233.70.35.tproxy: Flags [S], cksum 0xacc9 (correct), seq 2532090843, win 64240, options [mss 1460,sackOK,TS val 3899336314 ecr 0,nop,wscale 7], length 0
    10.233.82.51.35038 > 10.233.70.35.tproxy: Flags [S], cksum 0x8cc9 (correct), seq 2532090843, win 64240, options [mss 1460,sackOK,TS val 3899344506 ecr 0,nop,wscale 7], length 0
    10.233.82.51.35038 > 10.233.70.35.tproxy: Flags [S], cksum 0x4dc9 (correct), seq 2532090843, win 64240, options [mss 1460,sackOK,TS val 3899360634 ecr 0,nop,wscale 7], length 0
    10.233.82.51.35038 > 10.233.70.35.tproxy: Flags [S], cksum 0xc9c8 (correct), seq 2532090843, win 64240, options [mss 1460,sackOK,TS val 3899394426 ecr 0,nop,wscale 7], length 0

What could I do to fix this such that Pod A is reachable by any pod in any of the nodes?

All of the nodes are in the same subnet but spread across two L2 switches. This issue seems to occur for some of the nodes in one of the switches however since Pod A's machine was reached by the tunnel, this observation is irrelevant.

Wytrzymały Wiktor avatar
it flag
Hello @ChristianAlis. Could you please provide additional information about your Calico setup? Which mode of Networking you've configured? Way of installation and operator? What's the unique (difference) in node(s) setup, since workload communication is failing?
ma flag
@WytrzymałyWiktor I've been using kubespray and its defaults exclusively. Currently, everything is on kubespray 2.15.1. The nodes in the first switch, all of them doesn't have this issue, were set up back in 2018 and already went through major changes in kubespray e.g., switching to kubeadm and calico from flannel. The nodes in the other switch, above half has this issue, were set up last November 2020. I'm using ipip and there's no unique difference in node setup that I've found.
Mikołaj Głodziak avatar
id flag
Is your problem still unresolved? How exactly did you set up your cluster? Did you see [this page](https://github.com/projectcalico/calico/issues/2426)?
ma flag
Partially. The issue in the particular setup which I based this question on is due to an automatically created netpol requiring a specific tag to connect. However, the issue of some of the pods from some nodes not able to connect is still open. I created a new post to better describe it: https://serverfault.com/questions/1078384/getaddrinfo-does-not-resolve-in-some-kubernetes-pod-on-some-hosts
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.