There is a small(4 replicas) mongodb cluster(replicaset). Every node is 24GB ram with 8GB wiredtiger cache size.
Cluster is used by many (about 100) applications which do few hundred requests per seconds. Most of queries are optimized.
From time to time(every few months) primary node is killed by oomkiller, then secondary node is killed (probably with the same request repeated by app), then it back to normal.
First - problem happend on mongodb 3.6, then mongodb cluster was upgraded to 4.4 but problem persists.
On graphs it's visible that about one minute before crash memory usage by mongodb start to grow up, few seconds before crash it's near limit and then crash.
How to find request which trigger problem? In logs and in profiler there is only information about finished requests, where problem probably is in request which don't finish.
It's possible to log every running request longer than few seconds before they finish? I think about dumping db.currentOp() to file every few seconds - i think it would allow to find what happens just before crash - but it's not nice solution, especially there is no information about memory.
Or - maybe there is another way to kill requests which takes more than some amount of memory or time?