We have been using bridges and libvirt VMs for a long time (starting with Ubuntu 16.04). Recently, we have encountered a problem with bridges (on VLAns). We have not yet identified what conditions cause the problem to appear. Some work, some don't.
The problem is that the VM is not able to talk to the upstream router. However, the VM can talk to the bridge on the host. The router also can talk to the bridge on the host. Router, bridge and VM use static IP addresses on the same subnet.
There is a single physical ethernet connection between the router and the host. The bridge uses a VLAN on that link. The bridge normally does not have an IP address on the host - we added it for debugging.
The problem looks like a MAC address conflict on the bridge. The ARP request (broadcast) gets through both ways. The ARP response (distinct destination) gets through to the router, but not to the VM (not seen on bridge).
We use ifupdown (not netplan). This is for legacy reasons.
The VM is created with virt-install
.
Some of the hosts have been upgraded from Ubuntu 16.04 to 18.04 and then 20.04. Others started with 18.04 or 20.04. The problem seems to be more likely on a direct install of 20.04. It works on one host with 5.4.0-137-generic #154-Ubuntu, but not on another with 5.4.0-146-generic #163-Ubuntu. But we don't know if that is consistent.
On host:
2: wan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 48:21:0b:35:ae:51 brd ff:ff:ff:ff:ff:ff
inet 172.16.200.5/26 brd 172.16.200.63 scope global wan0
valid_lft forever preferred_lft forever
inet 172.16.200.6/26 scope global secondary wan0
valid_lft forever preferred_lft forever
inet6 fe80::4a21:bff:fe35:ae51/64 scope link
valid_lft forever preferred_lft forever
11: wan0.202@wan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-client state UP group default qlen 1000
link/ether 48:21:0b:35:ae:51 brd ff:ff:ff:ff:ff:ff
9: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br-client state UNKNOWN group default qlen 1000
link/ether fe:54:00:6c:23:ef brd ff:ff:ff:ff:ff:ff
inet6 fe80::fc54:ff:fe6c:23ef/64 scope link
valid_lft forever preferred_lft forever
12: br-client: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 48:21:0b:35:ae:51 brd ff:ff:ff:ff:ff:ff
inet 192.168.202.231/24 brd 192.168.202.255 scope global br-client
valid_lft forever preferred_lft forever
inet6 fe80::4a21:bff:fe35:ae51/64 scope link
valid_lft forever preferred_lft forever
On VM:
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 52:54:00:6c:23:ef brd ff:ff:ff:ff:ff:ff
inet 192.168.202.234/24 brd 192.168.202.255 scope global enp1s0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fe6c:23ef/64 scope link
valid_lft forever preferred_lft forever
libvirt networking on host:
Interface Type Source Model MAC
---------------------------------------------
vnet0 bridge br-client virtio 52:54:00:6c:23:ef
Update:
The physical interface (wan0) on the host seems to not handle frames destined with the VM MAC address. tcpdump on wan0 shows the VLAN frames when the destination is the bridge's MAC address, but tcpdump does not show any VLAN frames with the destination of the VMs MAC address. tcpdump on the router shows that the frames with the VMs MAC are sent on the correct wire.
How is the MAC address of the VM passed to the virtual switch that handles frames from the wire?