Score:0

AWS EKS kubernetes pod received empty response when calling an Ingress URL with pod in same node

us flag

Context:

i recently encountered an issue where a kubernetes pod (blackbox-exporter) will receive an empty response whenever it tries to call an Ingress URL of a pod that resides in the same node as itself. This reflected as an intermittently failing probe on the dashboard.

The ingress controller used is ingress-nginx and sits behind an AWS NLB.

Example:

node1: 192.168.20.2

node2: 192.168.20.3

node3: 192.166.20.4

blackbox-exporter (deployed in node1, with clusterIP 10.244.2.21)

foo-pod (deployed in node1, with clusterIP 10.244.2.22)

foo-pod (deployed in node2, with clusterIP 10.244.2.23)

foo-pod (deployed in node3, with clusterIP 10.244.2.24)

Ingress-controller logs:

192.168.20.3 - - [21/Jun/2021:15:15:07 +0000] "GET /metrics HTTP/1.1" 200 29973 "-" "curl/7.47.0" 90 0.005 [foo-pod] [] 10.32.0.2:3000 30015 0.004 200 e39022b47e857cc48eb6a127a7b8ce24

192.168.20.4 - - [21/Jun/2021:15:16:00 +0000] "GET /metrics HTTP/1.1" 200 29973 "-" "curl/7.47.0" 90 0.005 [foo-pod] [] 10.32.0.2:3000 30015 0.004 200 e39022b47e857cc48eb6a127a7b8ce24

192.168.20.3 - - [21/Jun/2021:15:16:30 +0000] "GET /metrics HTTP/1.1" 200 29973 "-" "curl/7.47.0" 90 0.005 [foo-pod] [] 10.32.0.2:3000 30015 0.004 200 e39022b47e857cc48eb6a127a7b8ce24

Tracing the ingress controller logs showed that "empty response" (timeout after 5s) only occurs when the pod that makes the ingress URL call is deployed in the same node as the target pod that is supposed to respond to that call.

Conclusion was made based on the fact that whenever the "empty response" was received, there is never a log with origin IP matching that on the node IP the blackbox-exporter is in, in this case it should be node1 192.168.20.2.

Suspecting its related to "incorrect" source IP and as a result the target pod doesn't know how to return a response, i switched to use AWS Classic L7 LB and the issue is resolved.

Now the logs showed the source IP replaced with the actual pod ClusterIP and all probing calls from the blackbox-exporter are successful.

10.244.2.21 - - [21/Jun/2021:15:15:07 +0000] "GET /metrics HTTP/1.1" 200 29973 "-" "curl/7.47.0" 90 0.005 [foo-pod] [] 10.32.0.2:3000 30015 0.004 200 e39022b47e857cc48eb6a127a7b8ce24

10.244.2.21 - - [21/Jun/2021:15:16:00 +0000] "GET /metrics HTTP/1.1" 200 29973 "-" "curl/7.47.0" 90 0.005 [foo-pod] [] 10.32.0.2:3000 30015 0.004 200 e39022b47e857cc48eb6a127a7b8ce24

10.244.2.21 - - [21/Jun/2021:15:16:30 +0000] "GET /metrics HTTP/1.1" 200 29973 "-" "curl/7.47.0" 90 0.005 [foo-pod] [] 10.32.0.2:3000 30015 0.004 200 e39022b47e857cc48eb6a127a7b8ce24

More information: Cluster version: AWS EKS v1.19

Question:

Linux/kubernetes networking isnt my strength so what i would like to ask is, what exactly is going on here?

Why does switching to use AWS Classic L7 load balancer solve the issue?

could any other components (kubernetes OR linux) be affecting this also?

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.