Score:1

Poor network performance on Ubuntu Server 22.04

jp flag

I have Ubuntu Server 22.04 installed on a HP Proliant micro server. This server is acting as a file server as well as DNS and DHCP server. The issue I am facing is poor network performance during file transfers. After a reboot I will get about 50-60Mib/s transfer speeds. But this will slowly drop and stabilise to ~2Mib/s. The reduction in speed seems to be time dependant rather than data dependant. i.e. half an hour after the reboot the speed will have dropped rather than after transferring 2GB.

All tests were carried out on a wired connection. I do not believe the network infrastructure is to blame as I have tested the transfer with a windows 10 machine on the same switch port, and I do not get any drop in performance.

I also do not believe it is the file server storage array. The array consists of 4 2TB WD red (WD20EFRX-68E) disks in a RAID 5 array. The system uses a 120GB SSD as the boot drive. I have hosted a 2GB file on the SSD and the network transfer was still slow.

I am using SMB and NFS for network shares. But get the same poor performance from both technologies.

The server is attached to the network using a bonded NIC in the form of an Intel 82571EB dual gagabit ethernet PCI adapter. The switch that is attached is a TP-link T1600G-18TS V1. The bonded link is connected to two ports configured as LACP. I do not think that this setup is the cause for the fault either, as I still get the same performance if I switch back to the onboard NIC.

The servers network connection is setup using netplan. The server ip is fixed as 192.168.0.2. See netplan config below;

chris@paveycloud:~$ cat /etc/netplan/01-netcfg.yaml
# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
  version: 2
  renderer: networkd
  ethernets:
    enp2s0f0:
      dhcp4: no
      dhcp6: no
    enp2s0f1:
      dhcp4: no
      dhcp6: no
  bonds:
    bond0:
      interfaces:
        - enp2s0f0
        - enp2s0f1
      addresses: [192.168.0.2/24]
      routes:
        - to: default
          via: 192.168.0.1
      nameservers:
        addresses:
          - 1.1.1.1
          - 1.0.0.1
      parameters:
        transmit-hash-policy: layer2
        mode: 802.3ad
        lacp-rate: slow
        mii-monitor-interval: 1

I am using dnsmask for my DNS and DHCP server, see configuration below;

chris@paveycloud:~$ cat /etc/dnsmasq.conf
# Listen address
listen-address=127.0.0.1,192.168.0.2

# Never forward plain names (without a domain)
domain-needed

# Never forward addresses in the non-routable address space (RFC1918)
bogus-priv

# Add domain to host names
expand-hosts

# Domain to be added if expand-hosts is set
domain=paveycloud.com

# Local domain to be served from /etc/hosts file
local=/paveycloud.com/

# local domain translation
address=/paveycloud.com/192.168.0.2
address=/paveycloud.noip.me/192.168.0.2

# dhcp stuff
dhcp-range=192.168.0.11,192.168.0.254,12h
dhcp-lease-max=100
dhcp-option=option:router,192.168.0.1
dhcp-option=option:dns-server,192.168.0.2
dhcp-option=option:netmask,255.255.255.0
dhcp-boot=lpxelinux.0
dhcp-authoritative

# static addresses 192.168.0.1 > 192.168.0.10
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.1,NETGEAR_VDSL_DM200_GATEWAY
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.3,NETGEAR_EX7000_HOUSE_AP
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.4,NETGEAR_EX7000_GARAGE_AP
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.5,TP-LINK_T1600G-18TS_HOUSE_SW
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.6,TP-LINK_T1600G-18TS_GARAGE_SW
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.7,BEDROOM_NVIDIA_SHIELD_ETH
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.8,LOUNGE_NVIDIA_SHIELD_ETH
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.9,LOUNGE_NVIDIA_SHIELD_WIFI
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.10,VBOX_TV_GATEWAY_ETH
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.70,NANOSTATION_LOCO_M5_AP
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.71,NANOSTATION_LOCO_M5_STATION
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.90,AXIS_VIDEO_SERVER
dhcp-host=xx:xx:xx:xx:xx:xx,192.168.0.100,ZONEMINDER_SERVER

chris@paveycloud:~$ cat /etc/resolv.conf
nameserver 1.1.1.1
nameserver 1.0.0.1

If I cat /prob/net/bonding/bond0 and use ethtool to look at bond0, everything seems ok;

cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v5.15.0-56-generic

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 1
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

802.3ad info
LACP active: on
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable

Slave Interface: enp2s0f1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:26:55:e3:bb:e3
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0

Slave Interface: enp2s0f0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:26:55:e3:bb:e2
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
chris@paveycloud:~$ sudo ethtool bond0
Settings for bond0:
        Supported ports: [  ]
        Supported link modes:   Not reported
        Supported pause frame use: No
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 2000Mb/s
        Duplex: Full
        Auto-negotiation: off
        Port: Other
        PHYAD: 0
        Transceiver: internal
        Link detected: yes

Any help with this issue will be greatly appreciated.

Chris

Matias N Goldberg avatar
vg flag
Check `top` & `iotop -o` to see if there's a process stealing away resources. Another simple explanation is your Ethernet cable being damaged and your eth card keeps negotiating a lower TX speed based on past performance (or something similar at TCP level). Anyway, check another good quality cable
Matias N Goldberg avatar
vg flag
Another explanation is power saving. If you use powertop or tlp, try disabling it, at least for eth interfaces. Or you can check tlp's documentation to see how eth's power saving commands are issued so that you can explicitly set it off
Score:0
jp flag

Update: I have found the solution, The issue was with fail2ban, more specifically how i had it configured. I had set the ban time to -1, to permanently ban ip address that attempt to log on to my server without a valid ssh key. Its been configured this way for about two years and had banned ~60k ip addresses. This apparently was causing the issue i was seeing. Whether or not this was due to a massively bloated ip table or just a processing overhead for network traffic, i do not know. But changing the ban time to 48h seems to have solved the issue.

Massive thanks to @Matias N Goldberg for his assistance.

Chris

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.