Score:0

Nothing but DHCP works after testing SR-IOV on Mellanox ConnectX-4 Lx

bj flag

I was following the Proxmox guide for enabling PCIe passthrough and SR-IOV for my NIC, since I am running Vyos in a VM as a router. However, after undoing all the changes the NIC is not working anymore. The only thing that seems to be working is DHCP (and not DHCPv6) when directly connecting another computer. I suspect it might be because I've setup a DHCP relay, since the port I connected to has a different VLAN (10) than the DHCP server (12). Here's output from tcpdump -e:

98:03:9b:b7:f7:ea > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 362: vlan 10, p 0, ethertype IPv4 (0x0800), 10.10.10.115.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 98:03:9b:b7:f7:ea, length 316
fe:63:6d:d4:da:18 > 2a:a2:54:7e:47:45, ethertype 802.1Q (0x8100), length 373: vlan 12, p 0, ethertype IPv4 (0x0800), 10.10.12.1.67 > 10.10.12.5.67: BOOTP/DHCP, Request from 98:03:9b:b7:f7:ea, length 327
fe:63:6d:d4:da:18 > 98:03:9b:b7:f7:ea, ethertype 802.1Q (0x8100), length 395: vlan 10, p 0, ethertype IPv4 (0x0800), 10.10.10.1.67 > 10.10.10.115.68: BOOTP/DHCP, Reply, length 349

I also tried clearing the DHCP lease from the server, and it still worked and assigned a new address.

After getting the address from DHCP I'm unable to ping the gateway at 10.10.10.1 from 10.10.10.115. I tried arping from the router, and it does not succeed. However when inspecting the traffic with wireshark, I can see an ARP reply from 10.10.10.1 after who has 10.10.10.1. The client (98:03:9b:b7:f7:ea) shows up as stale in the router ARP table, but everything seems ok from the client side.

I also have setup the other port on the NIC as a trunk, and connected that to a switch, however in that case even DHCP does not work. Another 1G NIC on the motherboard with the same configuration on the Proxmox host works as expected.

I tried resetting the NIC to defaults during the boot menu, and also using the mlxconfig -d <dev> reset. Before messing with SR-IOV it was working as expected, so I'm guessing I messed something up, but at this point I'm not sure what.

Here's my /etc/network/interfaces from the Proxmox host

auto lo
iface lo inet loopback

auto enp39s0
iface enp39s0 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0
#Trunk 1G

auto enp45s0f0np0
iface enp45s0f0np0 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0
#Trunk 25G

iface enp42s0f3u5u3c2 inet manual

auto enp38s0
iface enp38s0 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr1
#WAN

auto enp45s0f1np1
iface enp45s0f1np1 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0
        ovs_options tag=10 vlan_mode=native-untagged
#Users

auto intport01
iface intport01 inet static
        address 10.10.16.30/24
        gateway 10.10.16.1
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_options tag=16
#MGMT

auto vmbr0
iface vmbr0 inet manual
        ovs_type OVSBridge
        ovs_ports intport01 enp39s0 enp45s0f0np0 enp45s0f1np1
#Inside

auto vmbr1
iface vmbr1 inet manual
        ovs_type OVSBridge
        ovs_ports enp38s0
#WAN

enp45s0f1np1 is the port on which DHCP worked on, enp45s0f0np0 is the other port on the non-working NIC. On the router vmbr0 is added as a NIC, and the VLANs are setup there.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.