I am "upgrading" a server from CentOS 7 to Rocky 8. This server is a 1U Supermicro SYS-1029U-TRT, works as part of an HPC, and has two Ethernet and one Infiniband networking interfaces. One of the Ethernet interfaces is for the HPC, the other is used for the server room network and Internet access.
After bringing up a VM copy of the CentOS server I started a fresh install of Rocky 8. I reused the previous partition table and mdadm RAID that was already configured and formatted each partition. After the install and initial set up of the networking interfaces the server is exceptionally slow when dealing with any network traffic through the "external" interface. This issue was never evident under CentOS and has multiple symptoms.
- DNS queries do not get completed. This is seen especially well when
running a ping on a host on the local network or attempting to
download a file from the Internet or a local web server via curl or
wget.
- Pings to and from the server, using IP only, will either
fail or start working after some, usually about 4, packets fail.
- SSH connections to the server mostly fail with some attempts getting
a password prompt but the login never completes.
I have attempted a number of troubleshooting steps with no apparent fix yet.
- I verified that the IP settings, routing table, and resolv.conf were all correct.
- I disconnected both of the HPC network interfaces. I also tried with the interfaces connected but deactivated, with no configuration, and with them connected and configured.
- I verified that the Ethernet drivers were correct for the hardware. The system includes two 10Gbps Intel X540-AT2 interfaces which is using the kernel's ixgbe driver. I also downloaded and installed the latest version of Intel's driver.
- I verified that the switch port is properly configured, including the VLAN and MTU settings.
- I tested the other two interfaces via ping both to and from the server and both showed no issues.
- I disconnected the interface from the usual switch and used a new cable to connect it to a nearby switch on the same VLAN.
None of those steps changed a single thing. I'm out of ideas and am looking for further possible reasons for why this is occurring. If any information is needed I'll gladly add it as requested.
A previous issue that was not reported when CentOS 7 was installed is that sometimes a SSH connection would "pause" for up to a minute before being usable again. This is similar to the current issues to make me think this is a hardware issue.
Here are some ip command outputs, ip a and ip route, to show how things are configured. Also when configuring in nmtui I enabled the "Never use this network for default route", "Ignore automatically obtained routes", and "Ignore automatically obtained DNS parameters" settings on the eno2 and ib0 connections. None of these settings are enabled on the eno1 connection.
[root@hostname ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether ac:1f:6b:c9:b3:6e brd ff:ff:ff:ff:ff:ff
altname enp24s0f0
inet 10.0.21.150/22 brd 10.0.23.255 scope global noprefixroute eno1
valid_lft forever preferred_lft forever
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether ac:1f:6b:c9:b3:6f brd ff:ff:ff:ff:ff:ff
altname enp24s0f1
inet 10.33.0.110/22 brd 10.33.3.255 scope global noprefixroute eno2
valid_lft forever preferred_lft forever
4: ib0: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP group default qlen 256
link/infiniband 00:00:01:20:fe:80:00:00:00:00:00:00:0c:42:a1:03:00:c0:af:08 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 10.33.4.110/22 brd 10.33.7.255 scope global noprefixroute ib0
valid_lft forever preferred_lft forever
[root@hostname ~]# ip route
default via 10.0.20.1 dev eno1 proto static metric 100
10.0.20.0/22 dev eno1 proto kernel scope link src 10.0.21.150 metric 100
10.33.0.0/22 dev eno2 proto kernel scope link src 10.33.0.110 metric 101
10.33.4.0/22 dev ib0 proto kernel scope link src 10.33.4.110 metric 150
Edit1: Added more info, the CentOS issue.
Edit2: Added requested ip command outputs and some nmtui settings.