Score:0

High CPU usage by ksoftirqd

sa flag

We use GCP for running Kubernetes and for communication with our services in different locations using VM masquerading by iptables. The first time we faced an issue with performance when we use only one CPU for masquerading. We fixed it enabling SMP and allowing to use of more than one core, but after it, we faced another issue, after some time ksoftirqd utilize all available cores and VM become unresponsible.

It's look like this, but for all cores. Screenshot from top

Kernel version: Linux gke-masq-sap-v2-group-n1vj 3.10.0-1160.76.1.el7.x86_64 #1 SMP Wed Aug 10 16:21:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

I couldn't find any related issues in bugtracker. So now we fix issue by restarting this VM.

paladin avatar
id flag
It could be a CPU related hardware bug. Try to update the hypervisor, by updating, BIOS/UEFI, CPU microcode, kernel -- also do the same on the VM. Another possible explanation would be that you iptables creates an IP package forwarding loop. So carefully check your iptable rules until you have 100% totally understand what they are doing. Another explanation could be hardware limitation. When using multiple nodes, make sure that those nodes have a close physical connection to each other. Also check your network equipment, as this also could be a bottle neck.
Alexander Tolkachev avatar
sa flag
@paladin how I can check IP package forwarding loop? May be I could use tcpdump to find pachage in the loop?
paladin avatar
id flag
Use wireshark/tshark instead of tcpdump, it's a more comfortable tool with a lot of analyze functions. For that you would need to record the entire network traffic of your VM. I recommend to attach a dedicated virtual disk to your VM and record all traffic onto that disk. Recording all network traffic generates a lot of data. Analzying such big data needs computer with a lot of RAM. So before analzying the pure network data, I strongly recommend to analyze all current iptable rules. It might just be a singular TCP package which ping-pongs between 2 network-interfaces and reproduces itself.
paladin avatar
id flag
PS You record with tshark on your server and analzye with wireshark on a powerfull workstation (32GiB RAM and a lot of free SSD-disk space recommend). The recorded file can easily become very large. I once analyzed a 100GiB file on a machine with 16GiB RAM, it took around 30 minutes to load the file into wireshark.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.