Score:0

How to deal with ksoftirq hitting 100%?

sh flag

I have a Linux server with 48 CPU and, from time to time, some of them starts hitting almost 100% of usage. And when that happens, the usage doesn't go down and that's affecting some services performance. When I reboot this server, everything goes ok for several days when some starts hitting almost 100% of usage again. This is like a cycle. The problem is that the usage of some CUP, when hits 100%, doesn't change, unless I reboot server, and there is a service that's performing badly when that occurs.

When I use the htop command, I can see some process ksoftirq/0, ksoftirq/1, ksoftirq/2 ... is the responsible for this usage. I don't know what to do.

One approach that I tried was: I observed that the number of CPU where that occurs tends to be in the first half (0-23). I tried to change the affinity of those process (ksoftirq) to another CPU in the second half (24-47), but without any success.

What I want is: how to low this ksoftirq process usage or, at least, how to distribute them in order to keep CPU with a usage lower?

I'm not sure if the solution is about changing the affinity or changing some other attribute. I'm really clueless in this situation.

This is a physical server. Some info about it:

output of lsb_release -a:

No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:    11
Codename:   bullseye

output of uname -a:

Linux tiamat-dc 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64 GNU/Linux

output of lscpu:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   44 bits physical, 48 bits virtual
CPU(s):                          48
On-line CPU(s) list:             0-47
Thread(s) per core:              2
Core(s) per socket:              6
Socket(s):                       4
NUMA node(s):                    4
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           46
Model name:                      Intel(R) Xeon(R) CPU           E7530  @ 1.87GHz
Stepping:                        6
CPU MHz:                         1239.565
BogoMIPS:                        3723.93
Virtualization:                  VT-x
L1d cache:                       768 KiB
L1i cache:                       768 KiB
L2 cache:                        6 MiB
L3 cache:                        48 MiB
NUMA node0 CPU(s):               0,4,8,12,16,20,24,28,32,36,40,44
NUMA node1 CPU(s):               1,5,9,13,17,21,25,29,33,37,41,45
NUMA node2 CPU(s):               2,6,10,14,18,22,26,30,34,38,42,46
NUMA node3 CPU(s):               3,7,11,15,19,23,27,31,35,39,43,47
Vulnerability Itlb multihit:     KVM: Mitigation: VMX disabled
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp l
                                 m constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx
                                 16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida flush_l1d
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.