Score:0

Multi master kubernetes cluster with haproxy LB, cluster is not working after master node restart(Unable to execute kubectl cmds)

pk flag

I have installed multi master cluster, referring to the guide Setting up k8 multi master cluster

Setup details as following.

Load balancer: Haproxy LB

frontend kubernetes-frontend
    bind 192.168.1.11:6443
    mode tcp
    option tcplog
    default_backend kubernetes-backend

backend kubernetes-backend
    mode tcp
    option tcp-check
    balance roundrobin
    server master21.server 192.168.1.21:6443 check fall 3 rise 2
    server master22.server 192.168.1.22:6443 check fall 3 rise 2

Kubernetes version : v1.25.0

No Of masters: 2
No of workers: 2

Docker version 23.0.1

cri-dockerd V3.0

Env: Vmware virtual servers : Centos 8

After the installation and cluster setup, everything was running fine and, I have deployed a sample pod as well. Then I wanted to check the high availability of the cluster by shutting down one of the master server, here came the problem, Once I shut down one of the master server, kubectl commands stopped working. Tried restarting and switching Master nodes, yet kubectl commands are not working . when the command timed out it gives following error,( but not always)

error: Get "https://192.168.1.11:6443/api?timeout=32s": net/http: TLS handshake timeout - error from a previous attempt: EOF

I have tried curl commands with and without http(s) , this is the result

[***@master21 ~]$ curl -v https://192.168.1.11:6443/api?timeout=32s
*   Trying 192.168.1.11...
* TCP_NODELAY set
* Connected to 192.168.1.11 (192.168.1.11) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 192.168.1.11:6443
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 192.168.1.11:6443

[***@master21 ~]$ curl -v http://192.168.1.11:6443/api?timeout=32s
*   Trying 192.168.1.11...
* TCP_NODELAY set
* Connected to 192.168.1.11 (192.168.1.11) port 6443 (#0)
> GET /api?timeout=32s HTTP/1.1
> Host: 192.168.1.11:6443
> User-Agent: curl/7.61.1
> Accept: */*
>
* Empty reply from server

Can some one help me to solve this issue, I believe that TLS configs are required on haproxy, but don't understand how to configure it to be match with existing SSL setup in the k8 cluster

Output of curl -kv https://192.168.1.21:6443/healthz after bring down one master (master22.server whole vm)



[***@master21 ~]$ curl -kv https://192.168.1.21:6443/healthz
*   Trying 192.168.1.21...
* TCP_NODELAY set
* Connected to 192.168.1.21 (192.168.1.21) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=kube-apiserver
*  start date: Mar 23 08:10:26 2023 GMT
*  expire date: Mar 22 08:10:26 2024 GMT
*  issuer: CN=kubernetes
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* Using Stream ID: 1 (easy handle 0x5605644bf690)
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET /healthz HTTP/2
> Host: 192.168.1.21:6443
> User-Agent: curl/7.61.1
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/2 403
< audit-id: 930660ff-c7ee-4226-9b98-8fdaed13a251
< cache-control: no-cache, private
< content-type: application/json
< x-content-type-options: nosniff
< x-kubernetes-pf-flowschema-uid:
< x-kubernetes-pf-prioritylevel-uid:
< content-length: 224
< date: Fri, 24 Mar 2023 06:46:01 GMT
<
* TLSv1.3 (IN), TLS app data, [no content] (0):
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/healthz\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
* Connection #0 to host 192.168.1.21 left intact

As per further checks I have noticed that the issue occurs when I bring down a master node completely( whole VM) , when I stopped only the kubelet service, kubectl command gives following expected output

[***@master22 ~]$  kubectl get nodes
NAME              STATUS     ROLES           AGE   VERSION
master21.server   Ready      control-plane   22h   v1.25.0
master22.server   NotReady   control-plane   22h   v1.25.0
worker31.server   Ready      <none>          22h   v1.25.0
worker32.server   Ready      <none>          22h   v1.25.0

Turing85 avatar
cn flag
While Stack Overflow does permit certain questions about Kubernetes, we require that they (like all questions asked here) be specifically related to programming. This question does not appear to be specifically related to programming, but to cluster configuration, which makes it off-topic here. You might be able to ask questions like this one on [sf].
in flag
By "shutting down one of the master server" did you also take out one of the etcd members? Separately, what happens if you request to `curl -kv https://192.168.1.21:6443/healthz` directly?
user1672382 avatar
pk flag
@mdaniel I haven't installed etcd members separately ( Just 4 kubernetes installation on 4 Vms , the guide has 6 but I installed only 4) . I have updated the question with the output of your command, further I noticed that the issue occurs when I bring down a master node completely( whole VM). when I stopped only the kubelet service, kubectl command gives expected out put( please see the updated question)
Yvan G. avatar
tf flag
Based from this link[1] with the same error message, firewall might be blocking SSL in-bound traffic. [1]https://access.redhat.com/discussions/6994991
user1672382 avatar
pk flag
@YvanG. But port 6443 is open on in load balancer server and both master 1 and master 2 servers,
Yvan G. avatar
tf flag
Have you performed any test to double check that the port is open?
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.