My project at work is having applications running on Kubernetes that connect to MariaDB Galera cluster running on VMs. To manage the connections to the MariaDB, I'm using HAProxy running as pods inside the Kubernetes. However, the HAProxy is set up to use active-backup configuration as advised by the enterprise DBA in order to prevent deadlock issues.
The HAProxy works fine if there is only one pod, but with two and more pods, the 2nd pod onwards will have intermittent connectivity to the MariaDB with the following entries found inside the kubectl logs
:
[WARNING] 313/013536 (8) : Server galera/node2 is DOWN, reason: Layer4 timeout, check duration: 5001ms. 0 active and 2 backup servers left. Running on backup. 3 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 313/013611 (8) : Server galera/node2 is UP, reason: Layer4 check passed, check duration: 2ms. 1 active and 2 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] 313/013916 (8) : Server galera/node2 is DOWN, reason: Layer4 timeout, check duration: 5000ms. 0 active and 2 backup servers left. Running on backup. 4 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 313/013951 (8) : Server galera/node2 is UP, reason: Layer4 check passed, check duration: 1ms. 1 active and 2 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] 313/014446 (8) : Server galera/node2 is DOWN, reason: Layer4 timeout, check duration: 5001ms. 0 active and 2 backup servers left. Running on backup. 16 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 313/014521 (8) : Server galera/node2 is UP, reason: Layer4 check passed, check duration: 28ms. 1 active and 2 backup servers online. 0 sessions requeued, 0 total in queue.
The number of connections at the MariaDB servers is still way below the maximum connections setting and the DBA does not found any anomaly inside the MariaDB logs so I'm ruling out the MariaDB service itself rejecting the connection. The suspected cause of the intermittent issue would be somewhere between the HAProxy pods up until the VMs that run the MariaDB but I'm not sure where to start and what need to be checked.
Here is the haproxy.conf
:
listen galera
bind *:3306
balance source
mode tcp
option tcpka
default-server inter 5s downinter 5s fall 3 rise 1
server node1 172.20.193.120:3306 check weight 1 backup
server node2 172.20.193.121:3306 check weight 1
server node3 172.20.193.122:3306 check weight 1 backup
The HAProxy version is latest as of this post which is 2.6.6
. The Kubernetes cluster is running v1.22.12+vmware.1
VMWare Tanzu if there's any relevance.
Has anyone experienced such an issue before? What do you suggest where to start troubleshooting from?