Bridge interface causes high CPU utilization

Question

Score:1

Server

Bridge interface causes high CPU utilization

Dimos Masouros

12/15/23, 9:23 AM

We are facing a weird issue with a server we have on our laboratory. Specifically, the server shows high CPU utilization from low priority processes (blue color in htop) with 50% of the cores appearing to have 100% utilization as shown in the screenshot below.

htop high utilization

However, in the list of running processes there is no process that consumes this CPU:

$ ps aux --sort pcpu | head -n 20
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         2  0.0  0.0      0     0 ?        S    10:42   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    10:42   0:00 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S<   10:42   0:00 [kworker/0:0H]
root         6  0.0  0.0      0     0 ?        S    10:42   0:00 [kworker/u96:0]
root         8  0.0  0.0      0     0 ?        S    10:42   0:00 [rcu_sched]
root         9  0.0  0.0      0     0 ?        S    10:42   0:00 [rcu_bh]
root        10  0.0  0.0      0     0 ?        S    10:42   0:00 [migration/0]
root        11  0.0  0.0      0     0 ?        S    10:42   0:00 [watchdog/0]
root        12  0.0  0.0      0     0 ?        S    10:42   0:00 [watchdog/1]
root        13  0.0  0.0      0     0 ?        S    10:42   0:00 [migration/1]
root        14  0.0  0.0      0     0 ?        S    10:42   0:00 [ksoftirqd/1]
root        16  0.0  0.0      0     0 ?        S<   10:42   0:00 [kworker/1:0H]
root        17  0.0  0.0      0     0 ?        S    10:42   0:00 [watchdog/2]
root        18  0.0  0.0      0     0 ?        S    10:42   0:00 [migration/2]
root        19  0.0  0.0      0     0 ?        S    10:42   0:00 [ksoftirqd/2]
root        21  0.0  0.0      0     0 ?        S<   10:42   0:00 [kworker/2:0H]
root        22  0.0  0.0      0     0 ?        S    10:42   0:00 [watchdog/3]
root        23  0.0  0.0      0     0 ?        S    10:42   0:00 [migration/3]
root        24  0.0  0.0      0     0 ?        S    10:42   0:00 [ksoftirqd/3]

Cause of issue: After crawling around a bit, we have found that when disabling the bridge interface we have set up on the server (ifdown br0), the CPU utilization drops to normal states after 5-10 seconds. If we re-enable the bridge, then the utilization spikes again, similar to picture above.

What we have tried: We have tried disabling libvirtd service in case this was an issue with the VMs on the server, but no hope with that. We have also disabled docker and containerd, but nothing changed either. We have also removed and re-installed bridge-utils on the server and also rename the interface to br1, but the issue is still there. Last, we also booted with a different kernel version, but, still nothing.

Has anyone faced any similar issue before?

Server specs:

$ uname -a
Linux cheetara 4.4.0-174-generic #204-Ubuntu SMP Wed Jan 29 06:41:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="16.04.7 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.7 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

---- Edit Our server has two network interfaces p4p1 and p4p2. We have assigned a static IP to each interface through the DHCP server (for convenience let's say they are 137.100.1.11 and 137.100.1.12). Our /etc/network/interfaces file looks as follows:

auto lo
iface lo inet loopback

auto p4p1
iface p4p1 inet manual

auto br0
iface br0 inet static
  address 137.100.1.11
  broadcast 137.100.1.255
  netmask 255.255.255.0
  gateway 137.100.1.200
  dns-nameservers 137.100.1.210 137.100.1.220 8.8.8.8 8.8.4.4
  bridge_ports p4p1

auto ib0
iface ib0 inet static
  address 10.1.0.2
  netmask 255.255.255.0

auto ib1
iface ib1 inet static
  address 10.0.0.2
  netmask 255.255.255.0

where ib0 and ib1 are infiniband interfaces not related to external networking.

Also the routing is as follows:

$ ip route show
default via 137.100.1.200 dev br0 onlink
10.0.0.0/24 dev ib1  proto kernel  scope link  src 10.0.0.2 linkdown 
10.1.0.0/24 dev ib0  proto kernel  scope link  src 10.1.0.2 linkdown 
147.102.37.0/24 dev br0  proto kernel  scope link  src 147.102.37.24

413

1 + 6

central-processing-unit

bridge

utilization

htop

Bridge interface causes high CPU utilization

Post an answer