Score:1

Server

DNS problems on pool of preemptible-only nodes on GKE: endpoints of kube-dns service keeps failed pods

lena_punkt

6/25/23, 7:59 AM

I do have a GKE k8s cluster (k8s 1.22) that consists of preemptible nodes only, which includes critical services like kube-dns. It's a dev machine which can tolerate some broken minutes a day. Every time a node gets shut down which hosts a kube-dns pod, I run into DNS resolution problems that persist until I delete the failed pod (in 1.21, pods stay "Status: Failed" / "Reason: Shutdown" until manually deleted).

While I do expect some problems on preemptible nodes while they are being recycled, I would expect this to self-repair after some minutes. The underlying reason for the persistent problems seems to be that the failed pod does not get removed from the k8s Service / Endpoint. This is what I can see in the system:

Status of the pods via kubectl -n kube-system get po -l k8s-app=kube-dns

NAME                        READY   STATUS       RESTARTS   AGE
kube-dns-697dc8fc8b-47rxd   4/4     Terminated   0          43h
kube-dns-697dc8fc8b-mkfrp   4/4     Running      0          78m
kube-dns-697dc8fc8b-zfvn8   4/4     Running      0          19h

IP of the failed pod is 192.168.144.2 - and it still is listed as one of the endpoints of the service:

kubectl -n kube-system describe ep kube-dns brings this:

Name:         kube-dns
Namespace:    kube-system
Labels:       addonmanager.kubernetes.io/mode=Reconcile
              k8s-app=kube-dns
              kubernetes.io/cluster-service=true
              kubernetes.io/name=KubeDNS
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2022-02-21T10:15:54Z
Subsets:
  Addresses:          192.168.144.2,192.168.144.7,192.168.146.29
  NotReadyAddresses:  <none>
  Ports:
    Name     Port  Protocol
    ----     ----  --------
    dns-tcp  53    TCP
    dns      53    UDP

Events:  <none>

I know others worked around these issues by Scheduling kube-dns to other pods, but I would rather want to make this self-healing instead, as node failures can still happen on non-preemptible nodes, they are just less likely.

My questions:

Why is the failed pod still listed as one of the endpoints of the service, even hours after the initial node failure?
What can I do to mitigate the problem (besides adding some non-ephemeral nodes)?

It seems that kube-dns in the default deployment in GKE does not have a readiness probe attached to dnsmasq (port 53), which is targeted in the kube-dns service, and that having that could solve the issue - but I suspect it's not there for a reason that I don't yet understand.

EDIT: Apparently this does not happen on 1.21.6-gke.1500 (regular channel), but it does on 1.22.6-gke.1500 (rapid channel). I do not have a good explanation, but despite having a few failed pods today the kube-dns service only contains the working ones.

245

0 + 0

dnsmasq

kubernetes

google-kubernetes-engine

lena_punkt

8/4/23, 6:25 AM

Update: Looks like a k8s bug that will be fixed in 1.22 later on: https://github.com/kubernetes/kubernetes/issues/108594 - I will update with an answer to my own question once I have verified this working. Florian, if you can read this, if you make your now-deleted comment an answer to this post I can accept it as an answer later on and you get the credit.

Score:0

Server

Sergiusz

6/25/23, 10:07 AM

Preemptible nodes are not recommended for running critical workloads such as kube-dns (1) so situations like this are to be expected.

You can try mitigating the issue by marking pod as critical (2), using node auto-provisioning (3) or PodDisruptionBudget (4).
There are more information on this topic in this doc (5).

Additionally, some suggestions have been already made to Google (6).

If none of these resolve your problem you can report this via Public Issue Tracker.

0 + 0

lena_punkt

6/25/23, 1:32 PM

Correct, adding a node pool with standard nodes will make this less likely - but those nodes can still fail, and I do not see how this cannot happen in the same way e.g. when an availability zone fails. That is the main reason why I asked initially. Human intervention would be necessary for that case as well, correct?

Sergiusz

6/28/23, 8:02 AM

I never witnessed such situation and did not found any reports of such behavior in the issue tracker. But if you encounter this problem on a non-preemptible node then this should be reported to Google.

Score:0

Server

Vladislav Kopaygorodsky

8/13/23, 7:32 PM

It started happening on my env(preemptible nodes on gke) as well and it happens to all deployments, but kube-dns is the most critical one. I think it might be related to revisionHistoryLimit parameter. The default value is 10, so old replicas up to a count of 10 will be present for some period of time. I've set it to 0 and expect nodes to be replaced, let's see :)

0 + 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: DNS problems on pool of preemptible-only nodes on GKE: endpoints of kube-dns service keeps failed pods

TH: ปัญหา DNS ในกลุ่มของโหนดที่อนุญาตเท่านั้นบน GKE: จุดสิ้นสุดของบริการ kube-dns เก็บพ็อดที่ล้มเหลว

RO: Probleme DNS pe grupul de noduri doar cu preempțiune pe GKE: punctele finale ale serviciului kube-dns păstrează poduri eșuate

RU: Проблемы с DNS в пуле вытесняемых узлов в GKE: конечные точки службы kube-dns сохраняют неисправные модули

VI: Sự cố DNS trên nhóm các nút chỉ được ưu tiên trên GKE: điểm cuối của dịch vụ kube-dns giữ các nhóm bị lỗi

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.