I'm a little bit lost and need some help understand what exactly is happening with my server.
So this is a Proxmox (Debian) server with several LXC containers running in it, and from time to time everything just starts failing because it seems that new processes/childrens are unable to open. The syslog starts being filled with messages like this:
May 24 18:19:44 pvirtual08 ksmtuned[1645]: /usr/sbin/ksmtuned: fork: retry: Resource temporarily unavailable
May 24 18:19:46 pvirtual08 pve-firewall[4013]: status update error: command 'iptables-save' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
May 24 18:19:47 pvirtual08 pvestatd[4012]: command 'lxc-info -n 124 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
May 24 18:19:47 pvirtual08 pvestatd[4012]: command 'lxc-info -n 404 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
until eventually the entire server just crashes. It seems clear that the server stops being able to open new processes after some time but I don't quite understand why. Everything I read points to either a ulimit being reached, or server being out of RAM. Last time I checked the server was no where near having it's RAM full. Regarding ulimits, and this is where I'm a bit lots, from what I can tell that is not being reached either.
This is the current values for ulimit -a
root@pvirtual08:/var/log# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 514673
max locked memory (kbytes, -l) 65536
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 514656
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Last time it happened I checked the number of processes running with
ps -eLf | grep -v root | wc -l
and it was no where near the limit either, but maybe I'm just counting wrong, or checking the wrong limit.
Is there a way I can know which limit is being reached exactly for the server to stop opening new forks? I way to monitor the current usage vs limit so I can do a monitoring script for example?
I apologize for any dumb questions but this is a bit new and confusing to me, the way the ulimits work.