Score:0

15 minute timeout in HA k8s cluster when node stops

au flag

I set up a k8s cluster more or less following this guide. So I have three master and control-plane nodes. I use haproxy as load balancer with following config:

#/etc/haproxy/haproxy.cfg
#---------------------------------------------
# Global settings
#---------------------------------------------
global
    log /dev/log local0 
    log /dev/log local1 info
    daemon

#---------------------------------------------
# common defaults that all the listen and backend sections will
# use if not designated in their block
#---------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 1
    timeout http-request    10s
    timeout queue           20s
    timeout connect         5s
    timeout client          20s
    timeout server          20s
    timeout http-keep-alive 10s
    timeout check           10s

#---------------------------------------------
# apiserver frontend which proxys to the control plane nodes
#---------------------------------------------
frontend apiserver
    bind *:8443
    mode tcp
    option tcplog
    default_backend apiserver

#---------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------
backend apiserver
    option httpchk GET /healthz
    http-check expect status 200
    mode tcp
    option ssl-hello-chk
    option tcp-check
    balance     roundrobin
    server k8s1 x.x.x.15:6443 check
    server k8s2 x.x.x.16:6443 check
    server k8s3 x.x.x.17:6443 check

as well as keepalived for managing a VIP:

! /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
    router_id LVS_DEVEL
}
vrrp_script check_apiserver {
  script "/etc/keepalived/check_apiserver.sh"
  interval 3
  timeout 5
  fall 10
  rise 2
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens18
    virtual_router_id 53
    priority 101
    authentication {
        auth_type PASS
        auth_pass 123456
    }
    virtual_ipaddress {
        x.x.x.18
    }
    track_script {
        check_apiserver
    }
}

and the check_apiserver script:

#!/usr/bin/env bash

errorExit() {
    echo "*** $*" 1>&2
    exit 1
}

curl --silent --max-time 2 --insecure https://localhost:6443/ -o /dev/null || errorExit "Error GET https://localhost:6443/"
if ip addr | grep -q ${VIP}; then
    curl --silent --max-time 2 --insecure https://x.x.x.18:8443/ -o /dev/null || errorExit "Error GET https://x.x.x.18:8443/"
fi

kubelet, kubeadm and kubectl are all version 1.22.2

I do create the cluster using

sudo kubeadm init --control-plane-endpoint "x.x.x.18:8443" --upload-certs --v=5 --pod-network-cidr=172.31.0.0/16

and add weave using

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=172.31.0.0/16"

With this configuration I am able to create e.g. an EMQX cluster. The problem appears whenever I stop one node. Every statefulset, that had a Pod running on the stopped node, becomes unresponsive for pretty much exactly 15 minutes.

Checking keepalived using ip a s ens18 I do see the VIP move almost instant to a running node. Using the haproxy stats dashboard the node is marked as active up going DOWN after 2s and active or backup DOWN after 4 more seconds. This seems to work as well.

Modifying kubernetes timeouts (e.g. pod eviction time) does work, so that the Pods are marked as terminating earlier but the statefulset remains unresponsive for 15 minutes no matter the eviction time.

Setting up a three node kind network with all nodes master and control-plane does not show this behaviour, which is why I am guessing it is a k8s config problem. But what am I missing?

Edit1: The cluster remains accessible in that time so that I can watch kubectl get all --all-namespaces -o wide to check the cluster status. All I do see is the pods from the stopped node remain in terminating state.

Edit2: The only suspicious behavior was weave detecting a new MAC address after 15 min. In order to speed up error search I started kind without its own CNI and used weave instead. By this I could achieve identical logs and the exact same problem as with the "real" kubernetes cluster. Unfortunately I had no luck with weaves debug logs, therefore I switched to calico CNI and changed the podSubnet to 192.168.0.0/16. This solved the problem in kind but applying the exact same solution to my kubernetes cluster leaves me with the same problem again...

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.