Link aggregation is not working for me to a point of the server not being reachable: what could be the problem, and what are the best practices in this type of a (seemingly fairly common) setup?
Dell r730 with dual 10Gb NICs running Ubuntu 22.04 (Ubuntu 22.04.3 LTS (GNU/Linux 5.15.0-82-generic x86_64)) and serving up an iSCSI target to a VMware cluster.
The NICs are connected to 10Gb link-aggregated ports on two different (but interconnected - is "stacked" the right word") Meraki MS225 switches.
In Ubuntu, the NICs are "bonded":
renderer: networkd
ethernets:
enp130s0f0:
dhcp4: no
enp130s0f1:
dhcp4: no
bonds:
bond-00:
interfaces: [enp130s0f0,enp130s0f1]
addresses: [<IPv4>/24]
dhcp4: no
routes:
- to: default
via: <gateway_IP>
metric: 100
nameservers:
addresses: [<ns01_IP>,<ns02_IP>]
search: [localdom.local]
parameters:
mode: balance-xor
mii-monitor-interval: 1
If the ports on the Meraki switches are not link-aggregated - all is good, except the speeds are a bit slower (~40%) compared to using just one 10Gb NIC. (I was hoping that "bonding" the NICs and then configuring link aggregation in Meraki would give us speeds a bit higher than those on a single NIC.)
If the ports on the Meraki switches however are link-aggregated - packet loss of >50%, and the server becomes (almost) unresponsive.
(No special configuration in VMware. ESXi 7.0u3, the 10Gb links are active-active, and otherwise it's all default. Can't configure iSCSI network port binding in VMware because the 10Gb NICs are used for general traffic, not just iSCSI.)
What am I doing wrong?
Configurations I've tried:
- Meraki: no special configuration, no link aggregation
- Ubuntu: no link bonding, just two individually configured NICs each with its own IP: no issues, bandwidth ~10Gbps, only one link is used even with iSCSI multi-pathing.
- Ubuntu: bonded links in "balance-rr", "balance-xor", "802.3ad", "balance-alb" modes: ~6Gbps (40% slower), both links are used, see no errors in Meraki.
- Meraki: link aggregation enabled
- >50% packet loss, and generally unusable - regardless of bonding mode in Ubuntu (tried "balance-rr", "balance-xor", "802.3ad"). (Did not try w/o bonding - as it defeats the purpose.)
(I am thinking my next step is to disable LACP in Meraki and go back to individual NICs in Ubuntu with no bonding.)
Thanks!