Score:0

CPU Limiting & kswapd0 Advice Saught

cn flag

Through hours of testing, I have found that the nextcloud desktop sync client for ubuntu 20.04 (appimage or ppa) both seem to have a bug to where... if a common nextcloud file sync error occurs , kswapd0 spikes to 100% of CPU and the swapfile on Debian 10.5 server becomes completely filled. (clamscan also spikes 45% to 100% during kswapd0's climb to 100% of CPU). My other sync clients do not cause this problem (mobile, ubuntu native "online accounts") .

top command output

top - 16:08:59 up 22 min,  2 users,  load average: 89.42, 84.04, 55.66
Tasks: 378 total,  12 running, 359 sleeping,   0 stopped,   7 zombie
%Cpu(s):  3.4 us, 57.0 sy,  0.0 ni,  0.1 id, 39.5 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   3946.8 total,     90.2 free,   3766.4 used,     90.1 buff/cache
MiB Swap:   6144.0 total,      0.0 free,   6144.0 used.      4.9 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND           
   36 root      30  10       0      0      0 R  98.3   0.0  12:43.68 kswapd0           
 1691 mysql     20   0 1739540   2376      0 S   3.9   0.1   0:34.59 mysqld            
 1300 root      10 -10  116752   3400      0 D   3.3   0.1   0:41.96 AliYunDun         
 1544 root      20   0  806108    640      0 D   2.4   0.0   0:09.45 aliyun-service    
  161 root      20   0    4556   1904   1844 S   0.9   0.0   0:10.60 plymouthd         
 2746 git       20   0 1374728   6020      0 S   0.7   0.1   0:07.23 gitea             
 1114 root      20   0   24312    284      0 S   0.5   0.0   0:03.74 AliYunDunUpdate   
 5805 web2      20   0  292472 215456    920 D   0.4   5.3   0:05.43 clamscan          
  155 root       0 -20       0      0      0 I   0.3   0.0   0:07.11 kworker/0:1H-kbl+ 
  232 root      20   0   70888    284     88 D   0.3   0.0   0:03.74 systemd-journal   
  936 memcache  20   0  408168      0      0 S   0.3   0.0   0:02.19 memcached         
 3492 root      20   0   11380    756    556 R   0.3   0.0   0:03.28 top               
    1 root      20   0  170192   2972      0 D   0.3   0.1   0:11.03 systemd           
 1041 redis     20   0   54244    428      0 D   0.3   0.0   0:03.28 redis-server      
 4029 www-data  20   0  339376   2436     16 D   0.3   0.1   0:00.85 /usr/sbin/apach

I have tried using nice and cpulimit to prevent kswapd0 from reaching 100% and completely consuming the swap memory.. but kswapd0 seems to just power through both commands whether run individually or simultaneously and consumes 100% of CPU and swap, leaving me no choice but to reboot the server in order to clear the swap cache.

I have already reduced swapiness to zero. And I have tried:

To free pagecache:
    echo 1 > /proc/sys/vm/drop_caches
To free reclaimable slab objects (includes dentries and inodes):
    echo 2 > /proc/sys/vm/drop_caches
To free slab objects and pagecache:
    echo 3 > /proc/sys/vm/drop_caches

As I figure nextcloud file sync errors will be a common thing in the future, might someone be able to suggest how I can mitigate / prevent a simple file sync error from taking down my entire server?

UPDATE

After some additional testing and reading.. it seems that ClamAV is running clamscan on every upload and email which is spiking CPU usage to 100%. The relation to nextcloud is that I have anitvirus for files activated. Therefore, my file sync uploads also start clamscan as well, then overload the server.

The solution seems to be stop using clamscan but instead implement clamav-daemon. I am researching the problem now, but if someone can tell me how to switch from clamscan to clamav-daemon. I would appreciate it.

Martin avatar
kz flag
just a thought: I would consider completely turning off the swap space for your tests: ```swapoff -a``` - that way, the kernel OOM reaper would kill the process eating all the memory before it comes to swapping... And ```kswapd``` is a kernel process, you are only able to cpulimit / nice a process in user space!
Maestro223 avatar
cn flag
I cracked it...just posted answer.
Score:0
cn flag

The problem descibed above was essentially an "illusion" created by clamscan. Here's how I solved it:

The problem above was twofold, meaning amavis was running clamscan instead of clamd (has been for months with zero problem, I figure an update changed something), meanwhile nextcloud antivirus was defaulted to use clamscan instead of clamd. Therefore, whenever I would connect the Nextcloud client and see a "sync error", it was hiding the fact that clamscan overloading the server, as the clamscans for the nextcloud user were weighing in around 29% of CPU / file sync.

I discovered the amavis/clamscan problem by completely disabling nextcloud sync and just watching top command.

Solution:

1.) #dpkg-reconfigure clamav-daemon #change amavis to run clamd (see docs) <- For whatever reason this configuartion wasn't permanent and the system reverted after reboot. For a permanent means of placing CPU limit on clamscan on Debian/Ubuntu machines add:
CPUAccounting=true CPUQuota=X% to:
#nano /etc/systemd/system/clamav-daemon.service.d/extend.conf
2.) Change nextcloud's antivirus default from clamscan to clamav daemon (socket)

This will solve your problems.

Something useful, but optional here. For those operating a shared hosting environment with debian/ubuntu which has systemd/cgroups installed by default. I found an excellent tutorial on how to limit a user's CPU usage:

https://www.webhostingtalk.com/showthread.php?t=1832382

With this you can limit a user's overall CPU usage, as to avoid having clients crash the server because of bad application settings.

This problem cost me 4 days.. hope the answer helps someone else.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.