I was trying to figure this out for quite some while now. So I am trying my luck here now...
I have some VMs that should communicate with non VMs. There are 2 use cases.
The first are BareMetal Machines on the subnet that the external router is sitting on. The second are machines outside and behind the gateway pointed to by the external router.
So we have these two connection paths:
VM --> external router gateway --> internet gateway --> gateway somewhere else --> other machine on this subnet
VM --> external router gateway --> internet gateway --> baremetal machine on this subnet
So to make it even more visual: The external router will have its gateway for example at 10.5.1.10 and the Server to reach is sitting at 10.5.1.80.
So everything is fine until now, packets are flowing, but not all of them. And here it gets weird and I am searching for the reason.
So these VMs run kubernetes, and should connect to other baremetal machines on that subnet. The network running is calico, so BGP packets are being transferred. The weird thing is, those packets never reach the baremetal machines, but they do just fine with machines outside of the external routers subnet. So I looked into the data and noticed, the packets already disappear on the hypervisor of the VM. They are still visible on the tap device, but after that they're just lost.
So it is pretty obvious, something is filtering those packets out. The question remains now is: Why? Do I have some wrong configuration, did I understand something wrong? Or is there a way to make the whole setup swallow those packets? I had this problem btw. also with flannel vxlan packets... .
My setup in general is: openstack via kolla, openvswitch, external network is setup as flatnetwork.
I checked the iptable rules, but there are no rules at all for the subnet. And as described before, packets that are routed not to the 10.5.0.0/16 network, but for example the 10.50.0.0/16 network which is outside of this datacenter and it is unknown to openvswitch/openstack. So this has to have something to do with the external network configured on 10.5.0.0/16, which is removing those packets illegally.
Some more info:
This is the interface of the VM shown by ovs-vctl.
Port qvo503122c8-15
tag: 1
Interface qvo503122c8-15
Putting tcpdump on that
tcpdump -i qvo503122c8-15 -vvv | grep bgp
tcpdump: listening on qvo503122c8-15, link-type EN10MB (Ethernet), capture size 262144 bytes
Results in absolutely nothing. But communication in general is visible there. Looking at the master device, they're still there
tcpdump -i qbr503122c8-15 -vvv | grep bgp
tcpdump: listening on qbr503122c8-15, link-type EN10MB (Ethernet), capture size 262144 bytes
10.5.3.44.53969 > 10.15.0.91.bgp: Flags [S], cksum 0x17b7 (incorrect -> 0x0748), seq 607368, win 64860, options [mss 1410,sackOK,TS val 2138674293 ecr 0,nop,wscale 7], length 0
10.5.3.44.41505 > 10.15.0.92.bgp: Flags [S], cksum 0x17b8 (incorrect -> 0x572f), seq 103292561, win 64860, options [mss 1410,sackOK,TS val 2567404423 ecr 0,nop,wscale 7], length 0
10.5.3.44.53969 > 10.15.0.91.bgp: Flags [S], cksum 0x17b7 (incorrect -> 0xff67), seq 607368, win 64860, options [mss 1410,sackOK,TS val 2138676309 ecr 0,nop,wscale 7], length 0
10.5.3.44.49645 > 10.15.0.91.bgp: Flags [S], cksum 0x17b7 (incorrect -> 0xb20c), seq 137205467, win 64860, options [mss 1410,sackOK,TS val 2138677277 ecr 0,nop,wscale 7], length 0
10.5.3.44.49787 > 10.15.0.92.bgp: Flags [S], cksum 0x17b8 (incorrect -> 0x9dee), seq 608997803, win 64860, options [mss 1410,sackOK,TS val 2567406383 ecr 0,nop,wscale 7], length 0
For comparison, this the same machine, different VM, also has the same problem and those packets disappear, but in this case there are some machines outside of the same datacenter. And those are reached just fine on their subnet. The packets logged here wont disappear:
tcpdump -i qbr2d9b68d1-b7: -vvv | grep bgp
tcpdump: listening on qbr2d9b68d1-b7:, link-type EN10MB (Ethernet), capture size 262144 bytes
192.168.1.3.58395 > 10.5.0.177.bgp: Flags [P.], cksum 0x0961 (correct), seq 2080765188:2080765207, ack 2634444892, win 505, options [nop,nop,TS val 3871674789 ecr 2568136223], length 19: BGP
10.5.0.177.bgp > 192.168.1.3.58395: Flags [.], cksum 0xcc82 (incorrect -> 0x44c4), seq 1, ack 19, win 502, options [nop,nop,TS val 2568187609 ecr 3871674789], length 0
One thing I can see now however: The second tcpdump shows that the VM communicates to the external via its own address. While the others tried to communicate with the floating ip. The floating ip packets seem to be filtered out.