Score:1

Bridge interface causes high CPU utilization

mx flag

We are facing a weird issue with a server we have on our laboratory. Specifically, the server shows high CPU utilization from low priority processes (blue color in htop) with 50% of the cores appearing to have 100% utilization as shown in the screenshot below.

htop high utilization

However, in the list of running processes there is no process that consumes this CPU:

$ ps aux --sort pcpu | head -n 20
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         2  0.0  0.0      0     0 ?        S    10:42   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    10:42   0:00 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S<   10:42   0:00 [kworker/0:0H]
root         6  0.0  0.0      0     0 ?        S    10:42   0:00 [kworker/u96:0]
root         8  0.0  0.0      0     0 ?        S    10:42   0:00 [rcu_sched]
root         9  0.0  0.0      0     0 ?        S    10:42   0:00 [rcu_bh]
root        10  0.0  0.0      0     0 ?        S    10:42   0:00 [migration/0]
root        11  0.0  0.0      0     0 ?        S    10:42   0:00 [watchdog/0]
root        12  0.0  0.0      0     0 ?        S    10:42   0:00 [watchdog/1]
root        13  0.0  0.0      0     0 ?        S    10:42   0:00 [migration/1]
root        14  0.0  0.0      0     0 ?        S    10:42   0:00 [ksoftirqd/1]
root        16  0.0  0.0      0     0 ?        S<   10:42   0:00 [kworker/1:0H]
root        17  0.0  0.0      0     0 ?        S    10:42   0:00 [watchdog/2]
root        18  0.0  0.0      0     0 ?        S    10:42   0:00 [migration/2]
root        19  0.0  0.0      0     0 ?        S    10:42   0:00 [ksoftirqd/2]
root        21  0.0  0.0      0     0 ?        S<   10:42   0:00 [kworker/2:0H]
root        22  0.0  0.0      0     0 ?        S    10:42   0:00 [watchdog/3]
root        23  0.0  0.0      0     0 ?        S    10:42   0:00 [migration/3]
root        24  0.0  0.0      0     0 ?        S    10:42   0:00 [ksoftirqd/3]

Cause of issue: After crawling around a bit, we have found that when disabling the bridge interface we have set up on the server (ifdown br0), the CPU utilization drops to normal states after 5-10 seconds. If we re-enable the bridge, then the utilization spikes again, similar to picture above.

What we have tried: We have tried disabling libvirtd service in case this was an issue with the VMs on the server, but no hope with that. We have also disabled docker and containerd, but nothing changed either. We have also removed and re-installed bridge-utils on the server and also rename the interface to br1, but the issue is still there. Last, we also booted with a different kernel version, but, still nothing.

Has anyone faced any similar issue before?

Server specs:

$ uname -a
Linux cheetara 4.4.0-174-generic #204-Ubuntu SMP Wed Jan 29 06:41:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="16.04.7 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.7 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

---- Edit Our server has two network interfaces p4p1 and p4p2. We have assigned a static IP to each interface through the DHCP server (for convenience let's say they are 137.100.1.11 and 137.100.1.12). Our /etc/network/interfaces file looks as follows:

auto lo
iface lo inet loopback

auto p4p1
iface p4p1 inet manual

auto br0
iface br0 inet static
  address 137.100.1.11
  broadcast 137.100.1.255
  netmask 255.255.255.0
  gateway 137.100.1.200
  dns-nameservers 137.100.1.210 137.100.1.220 8.8.8.8 8.8.4.4
  bridge_ports p4p1

auto ib0
iface ib0 inet static
  address 10.1.0.2
  netmask 255.255.255.0

auto ib1
iface ib1 inet static
  address 10.0.0.2
  netmask 255.255.255.0

where ib0 and ib1 are infiniband interfaces not related to external networking.

Also the routing is as follows:

$ ip route show
default via 137.100.1.200 dev br0 onlink
10.0.0.0/24 dev ib1  proto kernel  scope link  src 10.0.0.2 linkdown 
10.1.0.0/24 dev ib0  proto kernel  scope link  src 10.1.0.2 linkdown 
147.102.37.0/24 dev br0  proto kernel  scope link  src 147.102.37.24 
Nikita Kipriyanov avatar
za flag
I've seen such behavior caused by the bridging loop. So please, explain your bridge configuration and the network topology. Probably, it is bridged somewhere else and has STP disabled.
Dimos Masouros avatar
mx flag
Thanks for the answer! I have edited my original post with more information regarding the network topology
Nikita Kipriyanov avatar
za flag
What exactly is `p4p1`? Quite unusual name for a NIC.
Dimos Masouros avatar
mx flag
`p4p1` is the name of the network interface that is linked with the bridge.
Nikita Kipriyanov avatar
za flag
I see that. I was asking what it is, Ethernet, Infiniband, whatever?
Dimos Masouros avatar
mx flag
Hello and happy new year :) It's an ethernet interface and it is the one that is bridged.
Score:0
sr flag

For higher speed(in my case it was 10 Gbps) the NIC offload feature is not properly working. Hence, the CPU is handling all the heavy lifting. The packets are handled by the Network stack of the kernel.

Enabling Jumbo frames(MAX MTU size) and increasing the ring buffer size has reduced the load on the CPU.

ip link set dev <interface> mtu <value>
ethtool -G <interface> rx <value> tx <value>

If the NIC offload feature is available, it should be enabled.

ethtool --offload <interface> tx on rx on

You can also use other performance tuning methods listed here. Source: https://sysadmin.miniconf.org/2016/lca2016-jamie_bainbridge-network_performance_tuning.html

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.