I've been dealing with this issue for weeks now.
I have the followin scenario:
couchdb2.3.1-A <===> couchdb2.3.1-B <===> couchdb3.1.1-A <===> couchdb3.1.1-B
where <===>
represents two pull replications, one configured on each side. i.e: couchdb1 pulls from couchdb2 and viceversa.
Couchdb is running in docker containers
If a write is made at couchdb2.3.1-A
, it has to make it through all servers until it comes to couchdb3.1.1-B
.
Al of them has an exclusive HDD. Couchdb does not share disk with any other service.
couchdb2.3.1
A
and B
have no problem.
couchdb3.1.1-A
gradually started to increase disk latency over time. So we stopped making write requests to it and started to talk only with couchdb3.1.1-B
. couchdb3.1.1-A
still receives writes but only by replication protocol. Disk latency did not change.
Changes we've made since problem started:
- Upgraded kernel from
4.15.0-55-generic
to 5.4.0-88-generic
- Upgraded ubuntu from
18.04
to 20.04
- Deleted
_global_changes
database from couchdb3.1.1-A
More info:
- Couchdb is using docker local-persist volumes.
- Disks are WD Purple for
2.3.1
couchdbs and WD Black for 3.1.1
couchdbs.
- We have only one database of
88GiB
and 2 views: one of 22GB
and a little one of 30MB
(highly updated)
docker stats
shows that couchdb3.1.1 uses lot of memory compared to 2.3.1:
3.5GiB
for couchdb3.1.1-A (not receiving direct write requests)
8.0GiB
for couchdb3.1.1-A (receiving both read and write requests)
226MiB
for 2.3.1-A
552MiB
for 2.3.1-B
- Database compaction is run at night. Problem only occurs over day, when most of the writes are made.
- Most of config is default.
Latency graph from munin monitoring:
disk latency
Any help is appreciated.