I have a multiple servers with Ethernet controller on board and InfiniBand controller installed in a PCI slot.
The problem is when im restarting openibd.service which should control only the infiniband adapter, for some reason, my ethernet network is restarting as well.
If im stopping openibd, my ethernet stops as well.
Ethernet and InfiniBand should be separate and independent from each other.
I need to be able to stop or restart openibd.service without dropping my ethernet connection
Operating System: AlmaLinux 8.7
Ethernet port in use ( 1gb ): eno2np1
Ofed version: MLNX_OFED_LINUX-5.9-0.5.6.0
When restarting openibd.service im losing the ethernet connection until openibd is running again.
I suspect both cards using the same driver but im not sure how to proceed.
Firmware is updated on all cards.
./mlxfwmanager_LeSI_23B_OFED-23.04-1_build4_fw_update_aug_2023 --query :
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX4LX
Part Number: Lenovo_Ultron_CX4Lx_2P_25GbE_1G-BaseT_Ax
Description: Lenovo Ultron ConnectX-4 Lx LOM 25GbE and 1G-BaseT
PSID: LNV0000000028
PCI Device Name: 0000:65:00.0
Base MAC: 088fc3a3cb9e
Versions: Current Available
FW 14.32.1010 14.32.1010
PXE 3.6.0502 3.6.0502
UEFI 14.25.0017 14.25.0017
Status: Up to date
Device #2:
----------
Device Type: ConnectX6
Part Number: SC57A40943_Ax
Description: ThinkSystem Mellanox ConnectX-6 HDR100/100GbE QSFP56 1-port VPI Adapter
PSID: LNV0000000016
PCI Device Name: 0000:17:00.0
Base GUID: 946dae030049bd14
Versions: Current Available
FW 20.37.1014 20.37.1014
PXE 3.7.0102 3.7.0102
UEFI 14.30.0013 14.30.0013
Status: Up to date
ethtool eno2np1:
Settings for eno2np1:
Supported ports: [ ]
Supported link modes: 1000baseKX/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: None RS BASER
Advertised link modes: 1000baseKX/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: None RS BASER
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: on
Port: None
PHYAD: 0
Transceiver: internal
Supports Wake-on: g
Wake-on: g
Current message level: 0x00000004 (4)
link
Link detected: yes
eno2np1 ib0:
Settings for ib0:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 100000Mb/s
Duplex: Full
Auto-negotiation: off
Port: Other
PHYAD: 0
Transceiver: internal
Link detected: yes
lspci -nnn :
17:00.0 Infiniband controller [0207]: Mellanox Technologies MT28908 Family [ConnectX-6] [15b3:101b]
65:00.0 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
65:00.1 Ethernet controller [0200]: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] [15b3:1015]
lshw -C network:
*-network
description: interface
product: MT28908 Family [ConnectX-6]
vendor: Mellanox Technologies
physical id: 0
bus info: pci@0000:17:00.0
logical name: ib0
version: 00
serial: 00:00:0a:81:fe:80:00:00:00:00:00:00:94:6d:00:00:00:00:00:00
width: 64 bits
clock: 33MHz
capabilities: pciexpress vpd msix pm bus_master cap_list rom physical
configuration: autonegotiation=off broadcast=yes driver=mlx5_core[ib_ipoib] driverversion=5.9-0.5.5 duplex=full firmware=20.37.1014 (LNV0000000016) ip=192.168.0.3 latency=0 link=yes multicast=yes
resources: iomemory:21f0-21ef irq:18 memory:21ffc000000-21ffdffffff memory:d4200000-d42fffff
*-network:0
description: Ethernet interface
product: MT27710 Family [ConnectX-4 Lx]
vendor: Mellanox Technologies
physical id: 0
bus info: pci@0000:65:00.0
logical name: eno1np0
version: 00
serial: 08:8f:c3:a3:cb:9e
width: 64 bits
clock: 33MHz
capabilities: pciexpress vpd msix pm bus_master cap_list rom ethernet physical autonegotiation
configuration: autonegotiation=on broadcast=yes driver=mlx5_core driverversion=5.9-0.5.5 firmware=14.32.1010 (LNV0000000028) latency=0 link=no multicast=yes
resources: iomemory:24f0-24ef irq:18 memory:24ffc000000-24ffdffffff memory:e3500000-e35fffff memory:24ffe800000-24ffeffffff
*-network:1
description: Ethernet interface
product: MT27710 Family [ConnectX-4 Lx]
vendor: Mellanox Technologies
physical id: 0.1
bus info: pci@0000:65:00.1
logical name: eno2np1
version: 00
serial: 08:8f:c3:a3:cb:9f
size: 1Gbit/s
width: 64 bits
clock: 33MHz
capabilities: pciexpress vpd msix pm bus_master cap_list rom ethernet physical autonegotiation
configuration: autonegotiation=on broadcast=yes driver=mlx5_core driverversion=5.9-0.5.5 duplex=full firmware=14.32.1010 (LNV0000000028) ip=10.0.26.3 latency=0 link=yes multicast=yes speed=1Gbit/s
resources: iomemory:24f0-24ef irq:19 memory:24ffa000000-24ffbffffff memory:e3400000-e34fffff memory:24ffe000000-24ffe7fffff
/var/log/messages:
systemd[1]: Stopping openibd - configure Mellanox devices...
root[8303]: openibd: running in manual mode
systemd[1]: /usr/lib/systemd/system/ibacm.service:22: Unknown lvalue 'ProtectHostname' in section 'Service'
systemd[1]: /usr/lib/systemd/system/ibacm.service:23: Unknown lvalue 'ProtectKernelLogs' in section 'Service'
NetworkManager[1345]: <info> [1692350943.3204] device (ib0): state change: activated -> unmanaged (reason 'removed', sys-iface-state: 'removed')
dbus-daemon[1341]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.1' (uid=0 pid=1345 comm="/usr/sbin/NetworkManager --no-daemon ")
systemd[1]: Starting Network Manager Script Dispatcher Service...
dbus-daemon[1341]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
systemd[1]: Started Network Manager Script Dispatcher Service.
systemd[1]: Stopping RDMA Node Description Daemon...
systemd[1]: rdma-ndd.service: Succeeded.
systemd[1]: Stopped RDMA Node Description Daemon.
NetworkManager[1345]: <info> [1692350945.4769] device (eno2np1): state change: activated -> unmanaged (reason 'removed', sys-iface-state: 'removed')
NetworkManager[1345]: <info> [1692350945.4912] dhcp4 (eno2np1): canceled DHCP transaction
NetworkManager[1345]: <info> [1692350945.4913] dhcp4 (eno2np1): activation: beginning transaction (timeout in 45 seconds)
NetworkManager[1345]: <info> [1692350945.4913] dhcp4 (eno2np1): state changed no lease
NetworkManager[1345]: <info> [1692350945.4926] manager: NetworkManager state is now DISCONNECTED
I tried so far
Installing clean operating system
Updating Server's UEFI firmware
Updating Mellanox firmware and ofed\