Patroni interconnection failover

Question

Score:0

Server

Patroni interconnection failover

EcksRay

12/2/23, 9:53 AM

3 Data Centres:

Patroni version: 2.1.4

PostgreSQL version: 14.4

Etcd version: 3.3.11

DC	Server	Name	Host	Status
1st	Patroni	patroni-s11	172.16.0.2	Leader
1st	Patroni	patroni-s12	172.16.0.3	Sync Standby
1st	ETCD	etcd-s11	172.16.0.4	Leader
2nd	Patroni	patroni-s21	172.16.1.2	Replica
2nd	Patroni	patroni-s22	172.16.1.3	Replica
2nd	ETCD	etcd-s21	172.16.1.4	slave
3rd	Patroni	patroni-s31	172.16.2.2	Replica
3rd	ETCD	etcd-s31	172.16.2.4	slave

I simulated interconnection failure between 1st Data Center and 2nd, both DC are up, but 1st and 2nd are doesn't "see" each other.

In this case, Patroni leader still remains in 1st DC. But servers in 2nd DC doesn't sync with cluster. If believe in cluster health, all fine, no replication lag between server. In real, all changes on master, doesn't sync with replicas on 2nd Data Center.

[user@patroni-s11 ~]$ sudo patronictl -c /etc/patroni/patroni.yml list
2022-12-01 16:00:00,015 - ERROR - Request to server 172.16.1.4:2379 failed: MaxRetryError("HTTPConnectionPool(host='172.16.1.4', port=2379): Max retries exceeded with url: /v2/keys/service/patroni_cluster/?recursive=true (Caused by ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')))",)
+ Cluster: patroni_cluster (7117639577766255236) ---+---------+-----+-----------+
| Member          | Host          | Role         | State   |  TL | Lag in MB |
+-----------------+---------------+--------------+---------+-----+-----------+
| patroni-s11     | 172.16.0.2    | Leader       | running | 103 |           |
| patroni-s12     | 172.16.0.3    | Sync Standby | running | 103 |         0 |
| patroni-s21     | 172.16.1.2    | Replica      | running | 103 |         0 |
| patroni-s22     | 172.16.1.3    | Replica      | running | 103 |         0 |
| patroni-s31     | 172.16.2.2    | Replica      | running | 103 |         0 |
+-----------------+---------------+--------------+---------+-----+-----------+

Still happens with Etcd servers, leader still remains in 1st DC.

[user@etcd-s11 ~]$ sudo etcdctl cluster-health
failed to check the health of member a85c06b926e6c6c8 on 172.16.1.4:2379: Get 172.16.1.4:2379/health: read tcp 10.220.0.3:38836->172.16.1.4:2379: read: connection reset by peer
member 261f8081db14d568 is healthy: got healthy result from 172.16.0.4:2379
member a85c06b926e6c6c8 is unreachable: [172.16.1.4: 2379] are all unreachable
member b87bd1df518cc9e4 is healthy: got healthy result from 172.16.2.4:2379
cluster is degraded

[user@etcd-s11 ~]$ sudo etcdctl member list
261f8081db14d568: name=etcd-s11 peerURLs=172.16.0.4:2380 clientURLs=172.16.0.4:2379 isLeader=true
a85c06b926e6c6c8: name=etcd-s21 peerURLs=172.16.1.4:2380 clientURLs=172.16.1.4:2379 isLeader=false
b87bd1df518cc9e4: name=etcd-s31 peerURLs=172.16.2.4:2380 clientURLs=172.16.2.4: 2379 isLeader=false

But Etcd in 3rd Data Center, sees that cluster is healthy

[user@etcd-s31 ~]$ sudo etcdctl cluster-health
member 261f8081db14d568 is healthy: got healthy result from http:// 172.16.0.4: 2379
member a85c06b926e6c6c8 is healthy: got healthy result from http:// 172.16.1.4: 2379
member b87bd1df518cc9e4 is healthy: got healthy result from http:// 172.16.2.4: 2379
cluster is healthy

I expected, that leaders will become the servers from 3rd DC.

Can Patroni\etcd change the leader in this case?

213

1 + 0

etcd

patroni

Patroni interconnection failover

Post an answer