Background
We're running several KVM servers on Ubuntu 16.04 and have started testing the upgrade to 20.04.
What we found is that even though we've never seen any swap usage on our 16.04 servers, after a couple of days, a 20.04 server will show a few hundred MB of swap usage.
It's not a big problem as vmstat show very little swap activity, and munin graphs confirms that swap in/out is insignificant, but we still want to understand the behaviour.
Up until now we've used Nagios to monitor swap usage and alert if any is found.
The system that has been upgraded from 16.04 til 20.04 is running five VM's with light load.
The host system show used mem around 29G of approx 200GB total memory. No peaks or anything that causes mem usage to get that high. VM's memory usage is restricted and no other memory hungry processes running on the KVM server itself.
root@kvm-xx:~# free -m
total used free shared buff/cache available
Mem: 193336 29495 768 5 163072 162404
Swap: 6675 240 6435
Top, example of processes swapping:
PID VIRT RES SHR S %MEM COMMAND SWAP
6447 18,2g 15,8g 22908 S 8,4 qemu-system-x86 239352
6160 2661052 1,9g 21880 S 1,0 qemu-system-x86 90788
6315 2129436 644388 21856 S 0,3 qemu-system-x86 29724
6391 10,4g 7,9g 22832 S 4,2 qemu-system-x86 24028
6197 6505584 3,0g 23008 S 1,6 qemu-system-x86 10972
5686 9908 2944 2720 S 0,0 cron 60
5805 24404 14440 4388 S 0,0 munin-node 4
Typical output from vmstat, showing no change in swap in/out.
root@kvm-xx:~# vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 407620 869916 214784 165081536 0 0 270 12 5 2 0 2 98 0 0
2 0 407620 869900 214784 165081536 0 0 0 28 8533 24140 0 2 98 0 0
1 0 407620 869836 214784 165081536 0 0 0 28 8642 24682 0 2 98 0 0
This system was running for a year with 0 swap on 16.04 with same VM's and load.
What's been tried and tested
After upgrade found that numad wasn't installed and VM's were not pinned to vcpu's on same physical CPU. Meaning memory usage across numa nodes. Installed numad and verified pinning. I believe swap usage was higher before that change, but can't say for certain.
Expected this behaviour to be kernel related, so upgraded from 5.4 to HWE 5.11 kernel. Same behaviour on both 5.4 and 5.11.
Tried to disable KSM (Kernel samepage merging), as we don't need it and to eliminate it as a possible source of swap usage.
Tried to disable swap completely to see if there was an actual memory starvation where OOM-killer would come to the party. This did not occur, so to me it seems like the swap isn't required, but still being used for some reason.
Ideas and thoughts
I believe that for some reason, the kernel decides to swap out inactive pages to swap, even with swappiness = 0. This is probably a behaviour that changed with the new kernels that ships with 20.04.
Ideally, we would like the kernel to only swap at last resort, as earlier behaviour, and use swap usage monitoring to detect usage of swap and raise a Nagios alarm.
I've been reading several threads on similar topics, but found conflicting information about what could be the explanation.
What I really want to avoid is a situation where we upgrade some of our more heavy loaded 16.04 servers to 20.04, and see this problem escalate into a real issue in production.
I'm aware of swapoff / swapon on manually move memory out of swap, but the question is why it's swapping in the first place.
If anyone has some insight into this, would be greatly appreciated.
Thanks!