HAProxy inside Kubernetes proxying to MariaDB Galera Cluster VMs having intermittent connectivity if more than one pod

Question

Score:0

Server

HAProxy inside Kubernetes proxying to MariaDB Galera Cluster VMs having intermittent connectivity if more than one pod

Lukman

11/10/23, 2:27 AM

My project at work is having applications running on Kubernetes that connect to MariaDB Galera cluster running on VMs. To manage the connections to the MariaDB, I'm using HAProxy running as pods inside the Kubernetes. However, the HAProxy is set up to use active-backup configuration as advised by the enterprise DBA in order to prevent deadlock issues.

The HAProxy works fine if there is only one pod, but with two and more pods, the 2nd pod onwards will have intermittent connectivity to the MariaDB with the following entries found inside the kubectl logs:

[WARNING] 313/013536 (8) : Server galera/node2 is DOWN, reason: Layer4 timeout, check duration: 5001ms. 0 active and 2 backup servers left. Running on backup. 3 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 313/013611 (8) : Server galera/node2 is UP, reason: Layer4 check passed, check duration: 2ms. 1 active and 2 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] 313/013916 (8) : Server galera/node2 is DOWN, reason: Layer4 timeout, check duration: 5000ms. 0 active and 2 backup servers left. Running on backup. 4 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 313/013951 (8) : Server galera/node2 is UP, reason: Layer4 check passed, check duration: 1ms. 1 active and 2 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] 313/014446 (8) : Server galera/node2 is DOWN, reason: Layer4 timeout, check duration: 5001ms. 0 active and 2 backup servers left. Running on backup. 16 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 313/014521 (8) : Server galera/node2 is UP, reason: Layer4 check passed, check duration: 28ms. 1 active and 2 backup servers online. 0 sessions requeued, 0 total in queue.

The number of connections at the MariaDB servers is still way below the maximum connections setting and the DBA does not found any anomaly inside the MariaDB logs so I'm ruling out the MariaDB service itself rejecting the connection. The suspected cause of the intermittent issue would be somewhere between the HAProxy pods up until the VMs that run the MariaDB but I'm not sure where to start and what need to be checked.

Here is the haproxy.conf:

listen galera
  bind *:3306
  balance source
  mode tcp
  option tcpka
  default-server inter 5s downinter 5s fall 3 rise 1
  server node1 172.20.193.120:3306 check weight 1 backup
  server node2 172.20.193.121:3306 check weight 1
  server node3 172.20.193.122:3306 check weight 1 backup

The HAProxy version is latest as of this post which is 2.6.6. The Kubernetes cluster is running v1.22.12+vmware.1 VMWare Tanzu if there's any relevance.

Has anyone experienced such an issue before? What do you suggest where to start troubleshooting from?

200

0 + 2

haproxy

mariadb

galera

kubernetes

HAProxy inside Kubernetes proxying to MariaDB Galera Cluster VMs having intermittent connectivity if more than one pod

Post an answer