I have setup a kubernetes cluster with 2 master nodes (cp01 192.168.1.42, cp02 192.168.1.46) and 4 worker nodes, implemented with haproxy and keepalived running as static pods in the cluster, internal etcd cluster. For some silly reasons, I accidentally kubeadm reset -f on cp01. Now I am trying rejoin the cluster using kubeadm join command but I keep getting the dial tcp 192.168.1.49:8443: connect: connection refused, where 192.168.1.49 is the LoadBalancer IP. Please help! Below are the current configurations.
/etc/haproxy/haproxy.cfg on cp02
defaults
timeout connect 10s
timeout client 30s
timeout server 30s
frontend apiserver
bind *.8443
mode tcp
option tcplog
default_backend apiserver
backend apiserver
option httpchk GET /healthz
http-check expect status 200
mode tcp
option ssl-hello-chk
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
#server master01 192.168.1.42:6443 check ***the one i accidentally resetted
server master02 192.168.1.46:6443 check
/etc/keepalived/keepalived.conf on cp02
global_defs {
router_id LVS_DEVEL
script_user root
enable_script_security
dynamic_interfaces
}
vrrp_script check_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 3
weight -2
fall 10
rise 2
}
vrrp_instance VI_l {
state BACKUP
interface ens192
virtual_router_id 51
priority 101
authentication {
auth_type PASS
auth_pass ***
}
virtual_ipaddress {
192.168.1.49/24
}
track_script {
check_apiserver
}
}
cluster kubeadm-config
apiVersion: v1
data:
ClusterConfiguration: |
apiServer:
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 192.168.1.49:8443
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.19.2
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
ClusterStatus: |
apiEndpoints:
cp02:
advertiseAddress: 192.168.1.46
bindPort: 6443
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterStatus
...
kubectl cluster-info
Kubernetes master is running at https://192.168.1.49:8443
KubeDNS is running at https://192.168.1.49:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
More Info
cluster was initialised with --upload-certs on cp01.
I drained and deleted cp01 from the cluster.
kubeadm join --token ... --discovery-token-ca-cert-hash ... --control-plane --certificate-key ...
command returned:
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get "https://192.168.1.49:8443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp 192.168.1.49:8443: connect: connection refused
kubectl exec -n kube-system -it etcd-cp02 -- etcdctl --endpoints=https://192.168.1.46:2379 --key=/etc/kubernetes/pki/etcd/peer.key --cert=/etc/kubernetes/pki/etcd/peer.crt --cacert=/etc/kubernetes/pki/etcd/ca.crt member list
returned:
..., started, cp02, https://192.168.1.46:2380, https://192.168.1.46:2379, false
kubectl describe pod/etcd-cp02 -n kube-system
:
...
Container ID: docker://...
Image: k8s.gcr.io/etcd:3.4.13-0
Image ID: docker://...
Port: <none>
Host Port: <none>
Command:
etcd
--advertise-client-urls=https://192.168.1.46:2379
--cert-file=/etc/kubernetes/pki/etcd/server.crt
--client-cert-auth=true
--data-dir=/var/lib/etcd
--initial-advertise-peer-urls=https://192.168.1.46:2380
--initial-cluster=cp01=https://192.168.1.42:2380,cp02=https://192.168.1.46:2380
--initial-cluster-state=existing
--key-file=/etc/kubernetes/pki/etcd/server.key
--listen-client-urls=https://127.0.0.1:2379,https://192.168.1.46:2379
--listen-metrics-urls=http://127.0.0.1:2381
--listen-peer-urls=https://192.168.1.46:2380
--name=cp02
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
--peer-client-cert-auth=true
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
--snapshot-count=10000
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
...
Tried copying the certs to cp01:/etc/kubernetes/pki
before running
kubeadm join 192.168.1.49:8443 --token ... --discovery-token-ca-cert-hash
but returned same error.
# files copied over to cp01
ca.crt
ca.key
sa.key
sa.pub
front-proxy-ca.crt
front-proxy-ca.key
etcd/ca.crt
etcd/ca.key
Troubleshoot network
Able to ping 192.168.1.49 on cp01
nc -v 192.168.1.49 8443
on cp01 returned Ncat: Connection refused.
curl -k https://192.168.1.49:8443/api/v1...
works on cp02 and worker nodes (returns code 403 which should be normal).
/etc/cni/net.d/ is removed on cp01
Manually cleared iptables rules on cp01 with 'KUBE' or 'cali'.
firewalld is disabled on both cp01 and cp02.
I tried joining with a new server cp03 192.168.1.48 and encountered the same dial tcp 192.168.1.49:8443: connect: connection refused error.
netstat -tlnp | grep 8443
on cp02 returned:
tcp 0 0.0.0.0:8443 0.0.0.0:* LISTEN 27316/haproxy
nc -v 192.168.1.46 6443
on cp01 and cp03 returns:
Ncat: Connected to 192.168.1.46:6443
Any advice/guidance would be greatly appreciated as I am at a loss here. I'm thinking it might be due to the network rules on cp02 but I don't really know how to check this. Thank you!!