Score:0

Keepalived won't forward traffic to BACKUP node after Kubernetes cluster setup

cn flag

System Structure:

  • 10.10.1.86: Kubernetes master node
  • 10.10.1.87: Kubernetes worker 1 node; keepalived MASTER node
  • 10.10.1.88: Kubernetes worker 2 node; keepalived BACKUP node
  • 10.10.1.90: VIP, would load balance to .87 & .88; implemented by keepalived.

This Kubernetes cluster is a dev env, testing collect netflow log.

What I want to achieve is:

  1. All router / switch netflow log first output to .90
  2. Then use keepalived to load balance (lb_kind: NAT) to .87 & .88, which are two Kubernetes workers.
  3. There is NodePort Service to catch these traffic into Kubernetes cluster and do the rest data parsing jobs.
  • Something like:
        |                {OS Network}                   |   {Kubernetes Network}

                                K8s Worker -> filebeat -> logstash (deployments)
                              /
<data> -> [VIP] load balance
                              \ 
                                K8s Worker -> filebeat -> logstash (deployments)
  • filebeat.yml (have tested that traffics are all fine after filebeat, so I use file output to narrow root cause.)
# cat filebeat.yml
filebeat.inputs:

- type: tcp
  max_message_size: 10MiB
  host: "0.0.0.0:5100"

- type: udp
  max_message_size: 10KiB
  host: "0.0.0.0:5150"




#output.logstash:
#  hosts: ["10.10.1.87:30044", "10.10.1.88:30044"]
output.file:
  path: "/tmp/"
  filename: tmp-filebeat.out

Kubernetes

  • Master and Workers are 3 VMs in my private env; not any of GCP or AWS providers.
  • Version:
# kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:31:21Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T16:25:06Z", GoVersion:"go1.16.1", Compiler:"gc", Platform:"linux/amd64"}
  • Services
# cat logstash.service.yaml
apiVersion: v1
kind: Service
metadata:
  name: logstash-service
spec:
  type: NodePort
  selector:
    app: logstash
  ports:
    - port: 9514
      name: tcp-port
      targetPort: 9514
      nodePort: 30044
  • Once data get in Kubernetes, all works fine.
  • It was the VIP load balance not forwarding.

Keepalived conf

!Configuration File for keepalived
global_defs {
  router_id proxy1   # `proxy 2` at the other node
}


vrrp_instance VI_1 {
  state MASTER       # `BACKUP` at the other node
  interface ens160
  virtual_router_id 41
  priority 100       # `50` at the other node
  advert_int 1
  virtual_ipaddress {
    10.10.1.90/23
  }
}

virtual_server 10.10.1.90 5100 {
  delay_loop 30
  lb_algo rr
  lb_kind NAT
  protocol TCP
  persistence_timeout 0

  real_server 10.10.1.87 5100 {
    weight 1
  }
  real_server 10.10.1.88 5100 {
    weight 1
  }
}
virtual_server 10.10.1.90 5150 {
  delay_loop 30
  lb_algo rr
  lb_kind NAT
  protocol UDP
  persistence_timeout 0

  real_server 10.10.1.87 5150 {
    weight 1
  }
  real_server 10.10.1.88 5150 {
    weight 1
  }

It works Before Kubernetes cluster setup

  • Both .87 & .88 have installed keepalived, and rr (RoundRobin) load balance works fine (TCP and UDP).
  • Stop keepalived service (systemctl stop keepalived) when going to setup kubernetes cluster, just in case.

Problem occurred After Kubernetes cluster setup

  • Found that only MASTER node .87 can get traffic forwarded, the VIP can't forward to BACKUP node .88.
  • The forwarded data from MASTER is successfully catched by kubernetes NodePort and deployments.

Problem test by nc:

  • nc: only who holds VIP (MASTER node) can forward traffic, when rr forward to BACKUP, it just shows timeout.
  • also tested by nc -l 5100 on both server, only MASTER node got results.
# echo "test" | nc 10.10.1.90 5100
# echo "test" | nc 10.10.1.90 5100
Ncat: Connection timed out.
# echo "test" | nc 10.10.1.90 5100
# echo "test" | nc 10.10.1.90 5100
Ncat: Connection timed out.

Some Info

  • Package versions
# rpm -qa |grep keepalived
keepalived-1.3.5-19.el7.x86_64
  • Kubernetes CNI: Calico
# kubectl get pod -n kube-system
NAME                                      READY   STATUS    RESTARTS   AGE
calico-kube-controllers-b656ddcfc-wnkcj   1/1     Running   2          78d
calico-node-vnf4d                         1/1     Running   8          78d
calico-node-xgzd5                         1/1     Running   1          78d
calico-node-zt25t                         1/1     Running   8          78d
coredns-558bd4d5db-n6hnn                  1/1     Running   2          78d
coredns-558bd4d5db-zz2rb                  1/1     Running   2          78d
etcd-a86.axv.bz                           1/1     Running   2          78d
kube-apiserver-a86.axv.bz                 1/1     Running   2          78d
kube-controller-manager-a86.axv.bz        1/1     Running   2          78d
kube-proxy-ddwsr                          1/1     Running   2          78d
kube-proxy-hs4dx                          1/1     Running   3          78d
kube-proxy-qg2nq                          1/1     Running   1          78d
kube-scheduler-a86.axv.bz                 1/1     Running   2          78d
  • ipvsadm (same result on .87, .88)
# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.10.1.90:5100 rr
  -> 10.10.1.87:5100              Masq    1      0          0
  -> 10.10.1.88:5100              Masq    1      0          0
UDP  10.10.1.90:5150 rr
  -> 10.10.1.87:5150              Masq    1      0          0
  -> 10.10.1.88:5150              Masq    1      0          0
  • Selinux is always Permissive
  • If stop firewalld, still not work either.
  • sysctl difference:
# before:
net.ipv4.conf.all.accept_redirects = 1
net.ipv4.conf.all.forwarding = 0
net.ipv4.conf.all.route_localnet = 0
net.ipv4.conf.default.forwarding = 0
net.ipv4.conf.lo.forwarding = 0
net.ipv4.ip_forward = 0

# after
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.forwarding = 1
net.ipv4.conf.all.route_localnet = 1
net.ipv4.conf.default.forwarding = 1
net.ipv4.conf.lo.forwarding = 1
net.ipv4.ip_forward = 1

Not sure if any further check can do now, please advise, thank you!

Mikołaj Głodziak avatar
id flag
How did you set up your cluster? Which version of Kubernetes did you use? Did you use some cloud providor, or did you use bare metal? How did you configure networking inside your cluster? Please attach yaml files. How did you test everything on Kubernetes?
Kenting avatar
cn flag
Sorry about that (After `nc` test, I am sure that something goes wrong in the VIP load balancing to BACKUP node, so I did not attach Kubernetes Info.) ; `NodePort` service updated. Thank you!
Mikołaj Głodziak avatar
id flag
Is it your problem now resolved?
Kenting avatar
cn flag
No, I just updated more info about Kubernetes. Problem still unsolved; the workaround I could try is that use only VIP, instead of load balance (virtual server - real server).
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.