Score:0

PMTUD/ICMP black hole problems across a VXLAN tunnel

cn flag

I'm running a Proxmox PVE host and am trying to use VXLAN to connect the machines running on it to various networks in our Lab. However, I'm running into weird MTU-related problems that I don't understand.

First my setup. The basic layout is that virtual machines on the PVE host connect via a bridge to a VXLAN tunnel. On the other side of the tunnel I have a physical machine in the lab that acts as an VXLAN endpoint (EP). It connects to VTEP via bridge to one of it's ethernet ports, which in turns connects to the switch that holds the network I'm trying to put my VM into.

On the PVE Host (one VM and one VXLAN as example):

 ___________     __________     __________     ___________
|  VM eth0  |   |  Bridge  |   |  VXLAN   |   | Host eno1 |
| 192.168.. |___|   ----   |___|  VNI 1   |___|   10...   |___ to LabNet
| MTU 1500  |   | MTU 1550 |   | MTU 1550 |   | MTU 1600  |
|___________|   |__________|   |__________|   |___________|

In the lab (the endpoint with one tunnel + one lab device as example):

 ___________                        __________     __________     __________     ___________
| LabDevice |                      | EP eth1  |   |  Bridge  |   |  VXLAN   |   | EP eth0   |
| 192.168.. |___ lab switch etc ___|  ----    |___|   ----   |___|  VNI 1   |___|   10...   |___ to PVE Host
| MTU 1500  |                      | MTU 1500 |   | MTU 1550 |   | MTU 1550 |   | MTU 1600  |
|___________|                      |__________|   |__________|   |__________|   |___________|

Now, I get that PMTUD will not really work here because - being L2 - most of those devices can't report back, which is why I increased MTU for those devices that have to deal with the VXLAN overhead (that it's 1600 and not 1550 is unrelated, I just want to describe the as-is state exactly).

However, I'm still running into MTU mismatch/ICMP Black Hole issues:

Problem 1) Something in the chain claims to only support an MTU of 1450. If I try to connect from VM to LabDevice via SSH the connection hangs and then times out. If I test MTUs via ping -M do -s 1450 something somewhere answers with the usual fragmentation required... message, the max MTU of 1450 is stored and subsequent SSH connection attempts work (until the stored MTU1450-entry times out). The PVE host does have devices with a MTU set to 1450, but none of them are connected to the VM.

Problem 2) PMTUD does not work even for devices not involved with the tunnel. If I lower the MTU of the VM eth0 and ping -s... it from the LabDevice with something too large for the VM but OK for everything else I get zero response even though the VM should from my understanding be able to answer with ICMP fragmentation required... messages.

Semi-related: Is there anything I can do on the PVE host and the endpoint device to allow devices connected to the endpoint to discover a reduced MTU? Because there are some labs I might not be able to send Jumbo Frames to, and I'd prefer not to have to set a lower MTU on every single device in those labs.

Edit: Maybe also relevant: I'm currently not running multicast, but have set up the remote IPs via bridge fdb .... Also on the VM host, the VMs aren't connected directly to the bridge but via some veth magic.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.