3 Data Centres:
Patroni version: 2.1.4
PostgreSQL version: 14.4
Etcd version: 3.3.11
DC |
Server |
Name |
Host |
Status |
1st |
Patroni |
patroni-s11 |
172.16.0.2 |
Leader |
1st |
Patroni |
patroni-s12 |
172.16.0.3 |
Sync Standby |
1st |
ETCD |
etcd-s11 |
172.16.0.4 |
Leader |
2nd |
Patroni |
patroni-s21 |
172.16.1.2 |
Replica |
2nd |
Patroni |
patroni-s22 |
172.16.1.3 |
Replica |
2nd |
ETCD |
etcd-s21 |
172.16.1.4 |
slave |
3rd |
Patroni |
patroni-s31 |
172.16.2.2 |
Replica |
3rd |
ETCD |
etcd-s31 |
172.16.2.4 |
slave |
I simulated interconnection failure between 1st Data Center and 2nd, both DC are up, but 1st and 2nd are doesn't "see" each other.
In this case, Patroni leader still remains in 1st DC. But servers in 2nd DC doesn't sync with cluster. If believe in cluster health, all fine, no replication lag between server. In real, all changes on master, doesn't sync with replicas on 2nd Data Center.
[user@patroni-s11 ~]$ sudo patronictl -c /etc/patroni/patroni.yml list
2022-12-01 16:00:00,015 - ERROR - Request to server 172.16.1.4:2379 failed: MaxRetryError("HTTPConnectionPool(host='172.16.1.4', port=2379): Max retries exceeded with url: /v2/keys/service/patroni_cluster/?recursive=true (Caused by ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')))",)
+ Cluster: patroni_cluster (7117639577766255236) ---+---------+-----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------------+---------------+--------------+---------+-----+-----------+
| patroni-s11 | 172.16.0.2 | Leader | running | 103 | |
| patroni-s12 | 172.16.0.3 | Sync Standby | running | 103 | 0 |
| patroni-s21 | 172.16.1.2 | Replica | running | 103 | 0 |
| patroni-s22 | 172.16.1.3 | Replica | running | 103 | 0 |
| patroni-s31 | 172.16.2.2 | Replica | running | 103 | 0 |
+-----------------+---------------+--------------+---------+-----+-----------+
Still happens with Etcd servers, leader still remains in 1st DC.
[user@etcd-s11 ~]$ sudo etcdctl cluster-health
failed to check the health of member a85c06b926e6c6c8 on 172.16.1.4:2379: Get 172.16.1.4:2379/health: read tcp 10.220.0.3:38836->172.16.1.4:2379: read: connection reset by peer
member 261f8081db14d568 is healthy: got healthy result from 172.16.0.4:2379
member a85c06b926e6c6c8 is unreachable: [172.16.1.4: 2379] are all unreachable
member b87bd1df518cc9e4 is healthy: got healthy result from 172.16.2.4:2379
cluster is degraded
[user@etcd-s11 ~]$ sudo etcdctl member list
261f8081db14d568: name=etcd-s11 peerURLs=172.16.0.4:2380 clientURLs=172.16.0.4:2379 isLeader=true
a85c06b926e6c6c8: name=etcd-s21 peerURLs=172.16.1.4:2380 clientURLs=172.16.1.4:2379 isLeader=false
b87bd1df518cc9e4: name=etcd-s31 peerURLs=172.16.2.4:2380 clientURLs=172.16.2.4: 2379 isLeader=false
But Etcd in 3rd Data Center, sees that cluster is healthy
[user@etcd-s31 ~]$ sudo etcdctl cluster-health
member 261f8081db14d568 is healthy: got healthy result from http:// 172.16.0.4: 2379
member a85c06b926e6c6c8 is healthy: got healthy result from http:// 172.16.1.4: 2379
member b87bd1df518cc9e4 is healthy: got healthy result from http:// 172.16.2.4: 2379
cluster is healthy
I expected, that leaders will become the servers from 3rd DC.
Can Patroni\etcd change the leader in this case?