Score:1

How to debug why a NodePort is refusing connections in K8S?

cn flag

Introduction

I recently got a simple web app working on a three-node Ubuntu Server with MicroK8S. I decided to try rebuilding my cluster and reinstalling everything using YAML manifests, to ensure the process was replicable. However, the app is now not reachable from outside of the cluster. I am seeking debugging techniques to drill into why the NodePort is apparently not creating a TCP listener on all nodes.

Here are my nodes:

name IP colour role
arran 192.168.50.251 yellow leader
nikka 192.168.50.74 blue worker
yamazaki 192.168.50.135 green worker

The cluster has again elected to run the workload on the third node, Yamazaki. I expect any web traffic hitting Arran or Nikka to be internally re-routed to Yamazaki to be serviced, as was happening previously.

What I did

From the previously working cluster/app, here is what I did to reset everything:

  1. Do microk8s leave on all follower nodes

  2. Do microk8s kubectl delete node <nodename> on the leader for each follower node (they were not removed automatically when they left)

  3. Do microk8s reset on all nodes

  4. Enable addons (dns, ingress). I don't know if either are necessary

  5. Create join command on leader, microk8s add-node for each follower

  6. Run a fresh join command microk8s join <ip>/<token> on each follower

  7. Run microk8s status on any node to ensure cluster is in HA mode

  8. Sideload an app image tarball from the leader, using microk8s images import workload.tar

  9. Launch the app via microk8s kubectl apply -f k8s-manifests/production/pod.yaml -f k8s-manifests/production/nodeport.yaml

    Here is the Pod:

     apiVersion: v1
     kind: Pod
     metadata:
       name: k8s-workload
       annotations:
         kubectl.kubernetes.io/last-applied-configuration: |
           {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"k8s-workload","namespace":"default"},"spec":{"containers":[{"image":"k8s-workload","imagePullPolicy":"Never","name":"k8s-workload","ports":[{"containerPort":9090,"protocol":"TCP"}]}]}}
     spec:
       containers:
       - image: k8s-workload
         imagePullPolicy: Never
         name: k8s-workload
         ports:
         - containerPort: 9090
           protocol: TCP
    

    Here is the NodePort:

     apiVersion: v1
     kind: Service
     metadata:
       name: np-service
     spec:
       type: NodePort
       ports:
         - port: 9090
           targetPort: 9090
           nodePort: 30090
       selector:
         run: k8s-workload
       # This should not be needed, but it didn't help
       # this time anyway
       externalIPs: [192.168.50.251]
    
  10. Check the app is running via an internal container call, microk8s kubectl exec -ti k8s-workload -- curl http://localhost:9090 - this is fine

  11. Check the app is running via a port forwarder on any node, microk8s kubectl port-forward pod/k8s-workload 9090 --address='0.0.0.0' - this is fine

  12. Nodes not listening externally (curl http://localhost:30090 gets a refused connection, same with any node IP address from a non-cluster machine on the LAN)

System state

Here is what is running from microk8s kubectl get all -o wide:

NAME               READY   STATUS    RESTARTS   AGE   IP             NODE       NOMINATED NODE   READINESS GATES
pod/k8s-workload   1/1     Running   0          20h   10.1.134.193   yamazaki   <none>           <none>

NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP      PORT(S)          AGE     SELECTOR
service/kubernetes   ClusterIP   10.152.183.1     <none>           443/TCP          35d     <none>
service/np-service   NodePort    10.152.183.175   192.168.50.251   9090:30090/TCP   3d21h   run=k8s-workload

I don't know what service/kubernetes is, I assume it is just part of the standard K8S infra.

Observations

I think this article is saying that my web app needs to be a service, but I only have a pod. I think that when this was working previously, I only had a pod, but the cluster had gotten into a bit of a mess, so it is possible that a service version of the app was running at the same time as the pod version.

The article also suggests that I ought to be using an ingress system. However, given that a NodePort is my present learning focus, I don't want to give up with it just yet. Ingress can come later.

I think I can be sure that there are no firewall issues, since any connections to port 30090 are rejected even in a console session on a node in the cluster.

I would like to run something like microk8s kubectl logs service np-service, to see what the NodePort is doing, but the logs subcommand only works on pods.

What can I try next?

halfer avatar
cn flag
For readers still here - a fix has been supplied, but I am most keen to learn how I could have solved this myself. How can I view K8S system logs to see the error message that the missing label would have caused?
halfer avatar
cn flag
I assume the logged error would be the same one as can be obtained on the console: `No resources found in default namespace`.
halfer avatar
cn flag
The accepted answer below now indicates that a NodePort not having an Endpoint isn't necessarily an error. Notwithstanding, is there anything I can do to check a whole cluster for NodePorts or load balancers that don't go anywhere? This strikes me as something that one would want to detect, more often than not.
Score:3
ng flag

When using kubectl run to start Pods, Kubernetes automatically labels them with the name used whilst deploying.

For example, take a look at the YAML generated by kubectl run nginx --image=nginx -o yaml:

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: nginx <- Automatically assigned
  name: nginx
  namespace: default
spec:
  containers:
  - image: nginx
    ...

Now, assuming the YAML of the Pod k8s-workload you have provided is complete, this label is currently missing. This is important because of the selector you used in the NodePort's specs.

apiVersion: v1
kind: Service
metadata:
  name: np-service
spec:
  ...
  selector:
    run: k8s-workload <- This tells Kubernetes who the Service is for

I'm guessing that at the moment, Kubernetes simply cannot find the Pod that the Service is for. You can test this theory by running kubectl get pods -l run=k8s-workload. You should get an error message that looks something like No resources found in default namespace.

Fixing this is as easy as (re-)assigning the Label. This can be done by using the kubectl label command like kubectl label pod k8s-workload run=k8s-workload.

A detailed guide on how to debug Services, as well as more information on how Labels and Selectors work can be found in the official documentation.

Update

In relation to whether this situation would be logged: A Service without Endpoints is not an error and (to my knowledge) won't be logged anywhere. Imagine a deployment that is only needed during a few hours a week. The Deployment not being active, and thus the Service not having any Endpoints for 90% of the week is expected and does not mean something isn't configured correctly or not working.

halfer avatar
cn flag
Perfect, that fixed it - thanks! You were right about the error message, but should that not have been logged somewhere to K8S system logs? Could I have found that somewhere?
halfer avatar
cn flag
I am curious that you suggested adding a label to the pod via a manual command. For me that works against the aim of repeatability - should it not be in the manifest, which in turn is committed to version control? (I looked up the format to do this in the manifest, and applied it, and that brought the app to life on the K8S node LAN IP addresses).
halfer avatar
cn flag
I can imagine adding a label manually would be a useful sticking-plaster to have to hand though - if K8S could not resolve a selector in production, adding this to get it working quickly would perhaps be the best low-risk option.
Eleasar avatar
ng flag
Maybe we have a misunderstanding. A service not pointing to any pods is not an error and (to my knowledge) won't be logged anywhere. Imagine a deployment that is only needed during a few hours a week. The Deployment not being active, and thus the Service not having any Endpoints for 90% of the Week is expected and does not mean something isn't configured correctly or not working.
Eleasar avatar
ng flag
Yes, ideally you would have your yaml files under source control and deploy them into your Kubernetes environment from there (doing something called GitOps). Editing the yaml and deploying it by hand vs running the command and copying the resulting yaml will yield the same result.
Eleasar avatar
ng flag
I chose the command because of it's "simplicity". You can always append "-o yaml --dry-run=client" to see a preview of the object that would be sent to your cluster, without actually submitting it.
halfer avatar
cn flag
Righto, thanks. The missing label not being an error makes sense now, even if it means this stuff is harder to debug for beginners. One "just has to know" what is wrong!
halfer avatar
cn flag
In fact I think I broke it myself - when I had a working system, I took a YAML dump using `microk8s kubectl get pod k8s-workload -o yaml` but trimmed a lot of cruft out (e.g. which node to run on seemed to be something that the engine should determine dynamically, and not be hardwired to a specific node - that defeats the purpose of dynamic scheduling). But in doing so I probably removed the label too.
Score:1
cn flag

As I suspected, the solution was simple. Eleasar has kindly supplied a label command to fix the problem, but I preferred to fix it in the YAML, as I would regard that as more repeatable. Here is my new pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: k8s-workload
  labels:
    run: k8s-workload
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"k8s-workload","namespace":"default"},"spec":{"containers":[{"image":"k8s-workload","imagePullPolicy":"Never","name":"k8s-workload","ports":[{"containerPort":9090,"protocol":"TCP"}]}]}}
spec:
  containers:
  - image: k8s-workload
    imagePullPolicy: Never
    name: k8s-workload
    ports:
    - containerPort: 9090
      protocol: TCP

There are just two new lines, to add a unique label to this object.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.