Score:-1

Why does kubernetes readiness probe fail, when manually calling the exec command returns success code?

US flag

I used kubeadm to deploy a bare-metal cluster with one control plane node and one worker node on the same LAN. After initializing the cluster (kubeadm init on the cp and kubeadm join on the worker node), I installed calico via helm. The calico-node and calico-kube-controllers pods do not reach ready state. However, they seem to be functioning correctly, and if I manually call the commands that the liveness and readiness probes execute, I get the expected success response. I may have a calico-specific problem, but my immediate question is what could cause this behavior with the readiness probes?

The output of kubectl describe pod -n calico-system calico-node-xxxx:

Events:
  Type     Reason     Age                    From     Message
  ----     ------     ----                   ----     -------
  Warning  Unhealthy  5s (x7 over 43s)  kubelet  Readiness probe errored: rpc error: code = Unknown desc = command error: EOF, stdout: , stderr: , exit code -1

The probe configuration in the calico-node-xxxx pods' yaml:

    readinessProbe:
      exec:
        command:
        - /bin/calico-node
        - -felix-ready
      failureThreshold: 3
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    livenessProbe:
      failureThreshold: 3
      httpGet:
        host: localhost
        path: /liveness
        port: 9099
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 10

When I try kubectl exec -n calico-system calico-node-xxxx -- /bin/calico-node -felix-ready && echo "$?", I can see that the exit code is 0, a success. Likewise, curl localhost:9099/liveness it gets a 200 code and the expected response. This is true even if I execute the commands within a second of creating the pods, so I doubt it has to do with the failureThreshold or timeoutSeconds etc. My understanding of how the exec command actually gets called for the readiness probes is shaky, so maybe an explanation of how it could differ from kubectl exec would point me in the right direction?

Thanks.

Score:0
US flag

Ah, it was a bit hard to track down that it was this bug in cri-o https://github.com/cri-o/cri-o/issues/6184 because I had the outdated version of conmon from the ubuntu repo. Updating conmon fixed it.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.