Score:0

HAProxy inside Kubernetes proxying to MariaDB Galera Cluster VMs having intermittent connectivity if more than one pod

cn flag

My project at work is having applications running on Kubernetes that connect to MariaDB Galera cluster running on VMs. To manage the connections to the MariaDB, I'm using HAProxy running as pods inside the Kubernetes. However, the HAProxy is set up to use active-backup configuration as advised by the enterprise DBA in order to prevent deadlock issues.

The HAProxy works fine if there is only one pod, but with two and more pods, the 2nd pod onwards will have intermittent connectivity to the MariaDB with the following entries found inside the kubectl logs:

[WARNING] 313/013536 (8) : Server galera/node2 is DOWN, reason: Layer4 timeout, check duration: 5001ms. 0 active and 2 backup servers left. Running on backup. 3 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 313/013611 (8) : Server galera/node2 is UP, reason: Layer4 check passed, check duration: 2ms. 1 active and 2 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] 313/013916 (8) : Server galera/node2 is DOWN, reason: Layer4 timeout, check duration: 5000ms. 0 active and 2 backup servers left. Running on backup. 4 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 313/013951 (8) : Server galera/node2 is UP, reason: Layer4 check passed, check duration: 1ms. 1 active and 2 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] 313/014446 (8) : Server galera/node2 is DOWN, reason: Layer4 timeout, check duration: 5001ms. 0 active and 2 backup servers left. Running on backup. 16 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 313/014521 (8) : Server galera/node2 is UP, reason: Layer4 check passed, check duration: 28ms. 1 active and 2 backup servers online. 0 sessions requeued, 0 total in queue.

The number of connections at the MariaDB servers is still way below the maximum connections setting and the DBA does not found any anomaly inside the MariaDB logs so I'm ruling out the MariaDB service itself rejecting the connection. The suspected cause of the intermittent issue would be somewhere between the HAProxy pods up until the VMs that run the MariaDB but I'm not sure where to start and what need to be checked.

Here is the haproxy.conf:

listen galera
  bind *:3306
  balance source
  mode tcp
  option tcpka
  default-server inter 5s downinter 5s fall 3 rise 1
  server node1 172.20.193.120:3306 check weight 1 backup
  server node2 172.20.193.121:3306 check weight 1
  server node3 172.20.193.122:3306 check weight 1 backup

The HAProxy version is latest as of this post which is 2.6.6. The Kubernetes cluster is running v1.22.12+vmware.1 VMWare Tanzu if there's any relevance.

Has anyone experienced such an issue before? What do you suggest where to start troubleshooting from?

pt flag
Are those backend addresses pointing directly at Pods, or at Services?
Lukman avatar
cn flag
@larsks, if you are referring to the IP addresses inside `haproxy.conf`, those are IP addresses for the MariaDB VMs. HAProxy is in Kubernetes but MariaDB is running on VMs (3 VMs to be exact)
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.