Score:1

Server

How to debug why a NodePort is refusing connections in K8S?

halfer

6/15/24, 7:27 PM

Introduction

I recently got a simple web app working on a three-node Ubuntu Server with MicroK8S. I decided to try rebuilding my cluster and reinstalling everything using YAML manifests, to ensure the process was replicable. However, the app is now not reachable from outside of the cluster. I am seeking debugging techniques to drill into why the NodePort is apparently not creating a TCP listener on all nodes.

Here are my nodes:

name	IP	colour	role
arran	192.168.50.251	yellow	leader
nikka	192.168.50.74	blue	worker
yamazaki	192.168.50.135	green	worker

The cluster has again elected to run the workload on the third node, Yamazaki. I expect any web traffic hitting Arran or Nikka to be internally re-routed to Yamazaki to be serviced, as was happening previously.

What I did

From the previously working cluster/app, here is what I did to reset everything:

Do microk8s leave on all follower nodes
Do microk8s kubectl delete node <nodename> on the leader for each follower node (they were not removed automatically when they left)
Do microk8s reset on all nodes
Enable addons (dns, ingress). I don't know if either are necessary
Create join command on leader, microk8s add-node for each follower
Run a fresh join command microk8s join <ip>/<token> on each follower
Run microk8s status on any node to ensure cluster is in HA mode
Sideload an app image tarball from the leader, using microk8s images import workload.tar

Launch the app via microk8s kubectl apply -f k8s-manifests/production/pod.yaml -f k8s-manifests/production/nodeport.yaml

Here is the Pod:

 apiVersion: v1
 kind: Pod
 metadata:
   name: k8s-workload
   annotations:
     kubectl.kubernetes.io/last-applied-configuration: |
       {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"k8s-workload","namespace":"default"},"spec":{"containers":[{"image":"k8s-workload","imagePullPolicy":"Never","name":"k8s-workload","ports":[{"containerPort":9090,"protocol":"TCP"}]}]}}
 spec:
   containers:
   - image: k8s-workload
     imagePullPolicy: Never
     name: k8s-workload
     ports:
     - containerPort: 9090
       protocol: TCP

Here is the NodePort:

 apiVersion: v1
 kind: Service
 metadata:
   name: np-service
 spec:
   type: NodePort
   ports:
     - port: 9090
       targetPort: 9090
       nodePort: 30090
   selector:
     run: k8s-workload
   # This should not be needed, but it didn't help
   # this time anyway
   externalIPs: [192.168.50.251]

Check the app is running via an internal container call, microk8s kubectl exec -ti k8s-workload -- curl http://localhost:9090 - this is fine
Check the app is running via a port forwarder on any node, microk8s kubectl port-forward pod/k8s-workload 9090 --address='0.0.0.0' - this is fine
Nodes not listening externally (curl http://localhost:30090 gets a refused connection, same with any node IP address from a non-cluster machine on the LAN)

System state

Here is what is running from microk8s kubectl get all -o wide:

NAME               READY   STATUS    RESTARTS   AGE   IP             NODE       NOMINATED NODE   READINESS GATES
pod/k8s-workload   1/1     Running   0          20h   10.1.134.193   yamazaki   <none>           <none>

NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP      PORT(S)          AGE     SELECTOR
service/kubernetes   ClusterIP   10.152.183.1     <none>           443/TCP          35d     <none>
service/np-service   NodePort    10.152.183.175   192.168.50.251   9090:30090/TCP   3d21h   run=k8s-workload

I don't know what service/kubernetes is, I assume it is just part of the standard K8S infra.

Observations

I think this article is saying that my web app needs to be a service, but I only have a pod. I think that when this was working previously, I only had a pod, but the cluster had gotten into a bit of a mess, so it is possible that a service version of the app was running at the same time as the pod version.

The article also suggests that I ought to be using an ingress system. However, given that a NodePort is my present learning focus, I don't want to give up with it just yet. Ingress can come later.

I think I can be sure that there are no firewall issues, since any connections to port 30090 are rejected even in a console session on a node in the cluster.

I would like to run something like microk8s kubectl logs service np-service, to see what the NodePort is doing, but the logs subcommand only works on pods.

What can I try next?

296

2 + 3

kubernetes

microk8s

halfer

6/18/24, 10:55 PM

For readers still here - a fix has been supplied, but I am most keen to learn how I could have solved this myself. How can I view K8S system logs to see the error message that the missing label would have caused?

0

Reply

halfer

6/19/24, 10:46 AM

I assume the logged error would be the same one as can be obtained on the console: `No resources found in default namespace`.

0

Reply

halfer

6/20/24, 1:17 PM

The accepted answer below now indicates that a NodePort not having an Endpoint isn't necessarily an error. Notwithstanding, is there anything I can do to check a whole cluster for NodePorts or load balancers that don't go anywhere? This strikes me as something that one would want to detect, more often than not.

0

Reply

Score:3

Server

Eleasar

6/18/24, 10:37 AM

When using kubectl run to start Pods, Kubernetes automatically labels them with the name used whilst deploying.

For example, take a look at the YAML generated by kubectl run nginx --image=nginx -o yaml:

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: nginx <- Automatically assigned
  name: nginx
  namespace: default
spec:
  containers:
  - image: nginx
    ...

Now, assuming the YAML of the Pod k8s-workload you have provided is complete, this label is currently missing. This is important because of the selector you used in the NodePort's specs.

apiVersion: v1
kind: Service
metadata:
  name: np-service
spec:
  ...
  selector:
    run: k8s-workload <- This tells Kubernetes who the Service is for

I'm guessing that at the moment, Kubernetes simply cannot find the Pod that the Service is for. You can test this theory by running kubectl get pods -l run=k8s-workload. You should get an error message that looks something like No resources found in default namespace.

Fixing this is as easy as (re-)assigning the Label. This can be done by using the kubectl label command like kubectl label pod k8s-workload run=k8s-workload.

A detailed guide on how to debug Services, as well as more information on how Labels and Selectors work can be found in the official documentation.

Update

In relation to whether this situation would be logged: A Service without Endpoints is not an error and (to my knowledge) won't be logged anywhere. Imagine a deployment that is only needed during a few hours a week. The Deployment not being active, and thus the Service not having any Endpoints for 90% of the week is expected and does not mean something isn't configured correctly or not working.

+ 8

halfer

6/18/24, 1:49 PM

Perfect, that fixed it - thanks! You were right about the error message, but should that not have been logged somewhere to K8S system logs? Could I have found that somewhere?

0

Reply

halfer

6/18/24, 1:51 PM

I am curious that you suggested adding a label to the pod via a manual command. For me that works against the aim of repeatability - should it not be in the manifest, which in turn is committed to version control? (I looked up the format to do this in the manifest, and applied it, and that brought the app to life on the K8S node LAN IP addresses).

0

Reply

halfer

6/18/24, 2:08 PM

I can imagine adding a label manually would be a useful sticking-plaster to have to hand though - if K8S could not resolve a selector in production, adding this to get it working quickly would perhaps be the best low-risk option.

0

Reply

Eleasar

6/19/24, 11:52 AM

Maybe we have a misunderstanding. A service not pointing to any pods is not an error and (to my knowledge) won't be logged anywhere. Imagine a deployment that is only needed during a few hours a week. The Deployment not being active, and thus the Service not having any Endpoints for 90% of the Week is expected and does not mean something isn't configured correctly or not working.

0

Reply

Eleasar

6/19/24, 12:23 PM

Yes, ideally you would have your yaml files under source control and deploy them into your Kubernetes environment from there (doing something called GitOps). Editing the yaml and deploying it by hand vs running the command and copying the resulting yaml will yield the same result.

1

Reply

Eleasar

6/19/24, 12:26 PM

I chose the command because of it's "simplicity". You can always append "-o yaml --dry-run=client" to see a preview of the object that would be sent to your cluster, without actually submitting it.

0

Reply

halfer

6/19/24, 1:05 PM

Righto, thanks. The missing label not being an error makes sense now, even if it means this stuff is harder to debug for beginners. One "just has to know" what is wrong!

0

Reply

halfer

6/19/24, 1:07 PM

In fact I think I broke it myself - when I had a working system, I took a YAML dump using `microk8s kubectl get pod k8s-workload -o yaml` but trimmed a lot of cruft out (e.g. which node to run on seemed to be something that the engine should determine dynamically, and not be hardwired to a specific node - that defeats the purpose of dynamic scheduling). But in doing so I probably removed the label too.

0

Reply

Score:1

Server

halfer

6/18/24, 2:00 PM

As I suspected, the solution was simple. Eleasar has kindly supplied a label command to fix the problem, but I preferred to fix it in the YAML, as I would regard that as more repeatable. Here is my new pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: k8s-workload
  labels:
    run: k8s-workload
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"k8s-workload","namespace":"default"},"spec":{"containers":[{"image":"k8s-workload","imagePullPolicy":"Never","name":"k8s-workload","ports":[{"containerPort":9090,"protocol":"TCP"}]}]}}
spec:
  containers:
  - image: k8s-workload
    imagePullPolicy: Never
    name: k8s-workload
    ports:
    - containerPort: 9090
      protocol: TCP

There are just two new lines, to add a unique label to this object.

+ 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: How to debug why a NodePort is refusing connections in K8S?

How to debug why a NodePort is refusing connections in K8S?

Introduction

What I did

System state

Observations

Update

Post an answer