I have installed multi master cluster, referring to the guide Setting up k8 multi master cluster
Setup details as following.
Load balancer: Haproxy LB
frontend kubernetes-frontend
bind 192.168.1.11:6443
mode tcp
option tcplog
default_backend kubernetes-backend
backend kubernetes-backend
mode tcp
option tcp-check
balance roundrobin
server master21.server 192.168.1.21:6443 check fall 3 rise 2
server master22.server 192.168.1.22:6443 check fall 3 rise 2
Kubernetes version : v1.25.0
No Of masters: 2
No of workers: 2
Docker version 23.0.1
cri-dockerd V3.0
Env: Vmware virtual servers : Centos 8
After the installation and cluster setup, everything was running fine and, I have deployed a sample pod as well. Then I wanted to check the high availability of the cluster by shutting down one of the master server, here came the problem, Once I shut down one of the master server, kubectl commands stopped working. Tried restarting and switching Master nodes, yet kubectl commands are not working . when the command timed out it gives following error,( but not always)
error: Get "https://192.168.1.11:6443/api?timeout=32s": net/http: TLS handshake timeout - error from a previous attempt: EOF
I have tried curl commands with and without http(s) , this is the result
[***@master21 ~]$ curl -v https://192.168.1.11:6443/api?timeout=32s
* Trying 192.168.1.11...
* TCP_NODELAY set
* Connected to 192.168.1.11 (192.168.1.11) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 192.168.1.11:6443
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 192.168.1.11:6443
[***@master21 ~]$ curl -v http://192.168.1.11:6443/api?timeout=32s
* Trying 192.168.1.11...
* TCP_NODELAY set
* Connected to 192.168.1.11 (192.168.1.11) port 6443 (#0)
> GET /api?timeout=32s HTTP/1.1
> Host: 192.168.1.11:6443
> User-Agent: curl/7.61.1
> Accept: */*
>
* Empty reply from server
Can some one help me to solve this issue, I believe that TLS configs are required on haproxy, but don't understand how to configure it to be match with existing SSL setup in the k8 cluster
Output of curl -kv https://192.168.1.21:6443/healthz after bring down one master (master22.server whole vm)
[***@master21 ~]$ curl -kv https://192.168.1.21:6443/healthz
* Trying 192.168.1.21...
* TCP_NODELAY set
* Connected to 192.168.1.21 (192.168.1.21) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=kube-apiserver
* start date: Mar 23 08:10:26 2023 GMT
* expire date: Mar 22 08:10:26 2024 GMT
* issuer: CN=kubernetes
* SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* Using Stream ID: 1 (easy handle 0x5605644bf690)
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET /healthz HTTP/2
> Host: 192.168.1.21:6443
> User-Agent: curl/7.61.1
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/2 403
< audit-id: 930660ff-c7ee-4226-9b98-8fdaed13a251
< cache-control: no-cache, private
< content-type: application/json
< x-content-type-options: nosniff
< x-kubernetes-pf-flowschema-uid:
< x-kubernetes-pf-prioritylevel-uid:
< content-length: 224
< date: Fri, 24 Mar 2023 06:46:01 GMT
<
* TLSv1.3 (IN), TLS app data, [no content] (0):
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "forbidden: User \"system:anonymous\" cannot get path \"/healthz\"",
"reason": "Forbidden",
"details": {},
"code": 403
* Connection #0 to host 192.168.1.21 left intact
As per further checks I have noticed that the issue occurs when I bring down a master node completely( whole VM) , when I stopped only the kubelet service, kubectl command gives following expected output
[***@master22 ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master21.server Ready control-plane 22h v1.25.0
master22.server NotReady control-plane 22h v1.25.0
worker31.server Ready <none> 22h v1.25.0
worker32.server Ready <none> 22h v1.25.0