Score:0

linux system slowdown debugging, high sys usage

in flag

I have a remote linux system that became super slow yesterday. Since the remote luks unlocking I've setup doesn't seem to work reliable and I won't be able to physically access the machine within the next 10 days I'm trying to debug this instead of rebooting.

The system status tools I'm used to are htop and dstat and since I had dstat running in an ssh session I can see that since yesterday 2021-09-09 08:51:42 one cpu core is always fully used by "sys" - which I guess means the kernel?

I can't see any culprit process or thread in htop.

I've stopped all user services and unmounted everything non essential which made the system respond a bit better again but still not nearly as fast as it should (got an Intel i7 CPU with an SSD).

I've found https://tanelpoder.com/posts/high-system-load-low-cpu-utilization-on-linux/ and installed the referenced https://0x.tools/ to get this result for psn -G syscall,wchan:

=== Active Threads ========================================================================================

 samples | avg_threads | comm              | state                  | syscall   | wchan                    
---------------------------------------------
     100 |        1.00 | (btrfs-cleaner)   | Running (ON CPU)       | [running] | 0                        
     100 |        1.00 | (dpkg)            | Disk (Uninterruptible) | fsync     | btrfs_commit_transaction 
     100 |        1.00 | (systemd-journal) | Disk (Uninterruptible) | ftruncate | wait_current_trans       
       1 |        0.01 | (sshd)            | Running (ON CPU)       | [running] | 0                        
       1 |        0.01 | (thermald)        | Disk (Uninterruptible) | [running] | ec_guard                 
       1 |        0.01 | (thermald)        | Running (ON CPU)       | [running] | 0    

The dpkg process can be explained by me trying to run apt upgrade which run's around at a 1/1000th of the speed you'd normally expect (just a feeling, didn't measure it).

Maybe there's a problem with my btrfs root file system...? I can't find the btrfs-cleaner in htop, I guess I'm gonna research some more on what that is..

I did run a btrfs scrub last night which completed super fast and didn't find any problems:

# btrfs scrub status /
UUID:             2f38e0ad-7f16-4a36-8096-b7981d47b4ff
Scrub started:    Thu Sep  9 23:59:00 2021
Status:           finished
Duration:         0:00:24
Total to scrub:   53.09GiB
Rate:             1.78GiB/s
Error summary:    no errors found

But when I used nano to modify a config file on the root partition loading and saving it was super slow just now.

I just stumbled upon this: https://www.reddit.com/r/btrfs/comments/fmucrq/btrfs_snapshots_make_entire_system_lag_cpu_usage/ which has a comment that sounds similar to my problem:

every time on boot and after a snapshot btrfs-transacti and btrfs-cleaner would use up a core completely causing immense lag

only that this says it just lasts a few minutes on boot and snapshot creation, but I've disabled my btrbk backup setup on this system a few days ago when one of the attached disks started to show problems.

I'm not sure if my btrfs root filesystem was using qgroups, but I just ran btrfs quota disable / which took around 10 seconds and didn't give any feedback.

Anybody got any other hint's for me how to debug / solve this problem?

xogoxec344 avatar
in flag
Ah, I think I found why quotas where enabled, I've used btrfs-du and btdu to see what subvolumes use how much space and a comment over there says btrfs-du enables quoates automatically: https://unix.stackexchange.com/questions/190405/how-does-enabling-btrfs-quotas-impact-the-system says btrfs-du enables it automatically. `btdu` seems to be the better alternative to me.
Score:1
in flag

The problem where those btrfs quotas. Running

btrfs quota disable /

made the system usable again :)

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.