Score:3

Kubernetes coredns not receiving requests

cn flag

I've set up a kubernetes cluster, single node, debian 11. However, my CoreDNS doesn't seem to resolve anything. I'm noticing this by portainer being unable to load resources.

http: proxy error: dial tcp: lookup kubernetes.default.svc on 10.96.0.10:53: read udp 10.244.0.4:57589->10.96.0.10:53: i/o timeout

Seeing as this is a timeout to my DNS, I checked the service:

root@dmvandenberg:~/kubernetes# kubectl get svc -n kube-system -o wide
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   78m   k8s-app=kube-dns
root@dmvandenberg:~/kubernetes# kubectl get pods --selector=k8s-app=kube-dns -o wide -n kube-system
NAME                       READY   STATUS    RESTARTS   AGE   IP           NODE              NOMINATED NODE   READINESS GATES
coredns-78fcd69978-2b6cq   1/1     Running   0          79m   10.244.0.2   dmvandenberg.nl   <none>           <none>
coredns-78fcd69978-swprh   1/1     Running   0          79m   10.244.0.3   dmvandenberg.nl   <none>           <none>

I've set up my cluster with these files:

cat init.sh init2.sh
kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=all
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
kubectl create -f https://docs.projectcalico.org/manifests/custom-resources.yaml
kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl create -f localstorage.yml --save-config
kubectl create -f pvportainer.yml --save-config
kubectl patch storageclass local-storage -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
kubectl apply -n portainer -f https://raw.githubusercontent.com/portainer/k8s/master/deploy/manifests/portainer/portainer.yaml

I have also attempted with kubectl apply -f https://github.com/coreos/flannel/raw/master/Documentation/kube-flannel.yml instead of kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml kubectl create -f https://docs.projectcalico.org/manifests/custom-resources.yaml.

root@dmvandenberg:~/kubernetes# cat localstorage.yml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
root@dmvandenberg:~/kubernetes# cat pvportainer.yml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: portainer
spec:
  capacity:
    storage: 11Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage
  local:
    path: /dockerdirs/pvportainer
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - dmvandenberg.nl

I've narrowed down the issue to DNS resolution using the following command and output:

root@dmvandenberg:~/kubernetes# kubectl logs --namespace=kube-system -l k8s-app=kube-dns -f & tcpdump -ani cni0 udp port 53
[5] 9505
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on cni0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.4
linux/amd64, go1.16.4, 053c4d5
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.4
linux/amd64, go1.16.4, 053c4d5
21:21:07.629395 IP 10.244.0.4.44224 > 10.244.0.2.53: 3488+ AAAA? kubernetes.default.svc.portainer.svc.cluster.local. (68)
21:21:07.629667 IP 10.244.0.4.43161 > 10.244.0.2.53: 433+ A? kubernetes.default.svc.portainer.svc.cluster.local. (68)
21:21:12.630395 IP 10.244.0.4.54508 > 10.244.0.3.53: 61466+ AAAA? kubernetes.default.svc.portainer.svc.cluster.local. (68)
21:21:12.630453 IP 10.244.0.4.46088 > 10.244.0.2.53: 55999+ A? kubernetes.default.svc.portainer.svc.cluster.local. (68)
^C
4 packets captured
4 packets received by filter
0 packets dropped by kernel

I would expect to see replies to the DNS queries, but I'm not seeing any. On internet I found something about adding "log" to the corefile of coredns, so I tried that, but am not seeing any loglines appear. This convinces me that the UDP messages, as shown by tcpdump, are not being read/received by coredns.

I went by all the steps in https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/, but this didn't get me any further.

I'm getting stuck after this though. How can I continue debugging? What could be wrong?

Edit: I've tried following this guide: https://www.oueta.com/linux/create-a-debian-11-kubernetes-cluster-with-kubeadm/ I'm seeing exactly the same result, on a different interface:

16:56:06.482769 cali6bd455d068f In  IP 172.20.122.129.60650 > 10.96.0.10.53: 31215+ AAAA? kubernetes.default.svc.portainer.svc.cluster.local. (68)
16:56:06.482980 cali6bd455d068f In  IP 172.20.122.129.35119 > 10.96.0.10.53: 8608+ A? kubernetes.default.svc.portainer.svc.cluster.local. (68)
16:56:11.483200 cali6bd455d068f In  IP 172.20.122.129.57079 > 10.96.0.10.53: 61639+ AAAA? kubernetes.default.svc.portainer.svc.cluster.local. (68)
16:56:11.483309 cali6bd455d068f In  IP 172.20.122.129.38249 > 10.96.0.10.53: 14976+ A? kubernetes.default.svc.portainer.svc.cluster.local. (68)
16:56:16.484367 cali6bd455d068f In  IP 172.20.122.129.57768 > 10.96.0.10.53: 55396+ AAAA? kubernetes.default.svc.svc.cluster.local. (58)
16:56:16.484488 cali6bd455d068f In  IP 172.20.122.129.53058 > 10.96.0.10.53: 50700+ A? kubernetes.default.svc.svc.cluster.local. (58)
16:56:21.484644 cali6bd455d068f In  IP 172.20.122.129.58857 > 10.96.0.10.53: 18986+ AAAA? kubernetes.default.svc.svc.cluster.local. (58)
16:56:21.484702 cali6bd455d068f In  IP 172.20.122.129.36861 > 10.96.0.10.53: 44020+ A? kubernetes.default.svc.svc.cluster.local. (58)

Running tcpdump on the entire interface reveals that TCP does seem to work, considering the ACK messages being sent back. What I did notice is that there is no traffic from 10.96.0.10 (the service) on to the pod, but I don't know if that's required?

17:03:29.224602 IP 169.254.167.173.36088 > 172.20.122.129.9000: Flags [.], ack 1, win 169, options [nop,nop,TS val 4014670766 ecr 4073454542], length 0
17:03:29.224869 IP 169.254.167.173.36088 > 172.20.122.129.9000: Flags [P.], seq 1:107, ack 1, win 169, options [nop,nop,TS val 4014670766 ecr 4073454542], length 106
17:03:29.224887 IP 172.20.122.129.9000 > 169.254.167.173.36088: Flags [.], ack 107, win 167, options [nop,nop,TS val 4073454542 ecr 4014670766], length 0
17:03:29.225273 IP 172.20.122.129.9000 > 169.254.167.173.36088: Flags [P.], seq 1:818, ack 107, win 167, options [nop,nop,TS val 4073454543 ecr 4014670766], length 817
17:03:29.225341 IP 169.254.167.173.36088 > 172.20.122.129.9000: Flags [.], ack 818, win 166, options [nop,nop,TS val 4014670767 ecr 4073454543], length 0
17:03:29.225399 IP 172.20.122.129.9000 > 169.254.167.173.36088: Flags [.], seq 818:7958, ack 107, win 167, options [nop,nop,TS val 4073454543 ecr 4014670767], length 7140
17:03:29.225422 IP 169.254.167.173.36088 > 172.20.122.129.9000: Flags [.], ack 7958, win 155, options [nop,nop,TS val 4014670767 ecr 4073454543], length 0
17:03:29.225430 IP 172.20.122.129.9000 > 169.254.167.173.36088: Flags [.], seq 7958:15098, ack 107, win 167, options [nop,nop,TS val 4073454543 ecr 4014670767], length 7140
17:03:29.225448 IP 169.254.167.173.36088 > 172.20.122.129.9000: Flags [.], ack 15098, win 138, options [nop,nop,TS val 4014670767 ecr 4073454543], length 0
17:03:29.225457 IP 172.20.122.129.9000 > 169.254.167.173.36088: Flags [P.], seq 15098:23486, ack 107, win 167, options [nop,nop,TS val 4073454543 ecr 4014670767], length 8388
17:03:29.225474 IP 169.254.167.173.36088 > 172.20.122.129.9000: Flags [.], ack 23486, win 119, options [nop,nop,TS val 4014670767 ecr 4073454543], length 0
17:03:29.225564 IP 172.20.122.129.9000 > 169.254.167.173.36088: Flags [F.], seq 23486, ack 107, win 167, options [nop,nop,TS val 4073454543 ecr 4014670767], length 0
17:03:29.225609 IP 169.254.167.173.36088 > 172.20.122.129.9000: Flags [R.], seq 107, ack 23486, win 166, options [nop,nop,TS val 4014670767 ecr 4073454543], length 0
17:03:29.524333 IP 172.20.122.129.9000 > 169.254.167.173.9984: Flags [.], ack 3370092883, win 166, options [nop,nop,TS val 4073454842 ecr 1976747960], length 0
17:03:29.524564 IP 169.254.167.173.9984 > 172.20.122.129.9000: Flags [.], ack 1, win 171, options [nop,nop,TS val 1976763065 ecr 4073424519], length 0
17:03:34.218598 IP 172.20.122.129.45239 > 10.96.0.10.53: 23854+ AAAA? kubernetes.default.svc. (40)
17:03:34.219065 IP 172.20.122.129.38604 > 10.96.0.10.53: 24098+ A? kubernetes.default.svc. (40)
17:03:34.388311 IP 172.20.122.129.9000 > 169.254.167.173.7394: Flags [.], ack 917, win 166, options [nop,nop,TS val 4073459706 ecr 1976752753], length 0
17:03:34.388402 IP 169.254.167.173.7394 > 172.20.122.129.9000: Flags [.], ack 1, win 171, options [nop,nop,TS val 1976767929 ecr 4073444530], length 0
17:03:34.388314 IP 172.20.122.129.9000 > 169.254.167.173.3949: Flags [.], ack 917, win 166, options [nop,nop,TS val 4073459706 ecr 1976752753], length 0
17:03:34.388424 IP 169.254.167.173.3949 > 172.20.122.129.9000: Flags [.], ack 1, win 171, options [nop,nop,TS val 1976767929 ecr 4073444530], length 0
17:03:34.388288 IP 172.20.122.129.9000 > 169.254.167.173.26855: Flags [.], ack 917, win 166, options [nop,nop,TS val 4073459706 ecr 1976752752], length 0
17:03:34.388544 IP 169.254.167.173.26855 > 172.20.122.129.9000: Flags [.], ack 1, win 171, options [nop,nop,TS val 1976767929 ecr 4073444529], length 0
17:03:39.216823 IP 169.254.167.173.36182 > 172.20.122.129.9000: Flags [S], seq 2192346809, win 43200, options [mss 1440,sackOK,TS val 4014680758 ecr 0,nop,wscale 8], length 0
17:03:39.216889 IP 172.20.122.129.9000 > 169.254.167.173.36182: Flags [S.], seq 1678785660, ack 2192346810, win 42840, options [mss 1440,sackOK,TS val 4073464535 ecr 4014680758,nop,wscale 8]
, length 0
mario avatar
cm flag
What version of kubernetes do you use ?
Daniël van den Berg avatar
cn flag
@mario 1.22.2-00
PjoterS avatar
ve flag
Is this issue is still persisting? It's your On-Prem environment (Linux/Virtualization software) or Cloud environment? Can you confirm that all deployed pod are working as expected, you are not lacking of resources?
Daniël van den Berg avatar
cn flag
@PjoterS No, I switched to simply using docker swarm. Good enough for now.
Alex G avatar
ar flag
@DaniëlvandenBerg Kindly post your workaround as an answer to help the community members that have the same issue.
Daniël van den Berg avatar
cn flag
@AlexG please read my previous comment.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.