Score:0

How do veth interfaces connect to each other on a linux box?

in flag
JMC

I have been playing with Google Kubernetes Engine cluster recently. I have a question regarding their CNI. I have read from GCP documents and other articles that there is a bridge which all veth interfaces connect to. Basically, for each container, a veth pair is created. One end of it is in container and the other end of it is connected to a bridge device. When containers on the same node communicates with each other, packets exchange is using layer 2 bridge device. This is also how GKE documentation describes.

https://cloud.google.com/kubernetes-engine/docs/concepts/network-overview#pods

https://medium.com/cloudzone/gke-networking-options-explained-demonstrated-5c0253415eba

I created a cluster on Google, I can see there is a bridge device docker0, but there is no interfaces associated with it.

gke-xxxxxxxxx /home/uuuuuuu # brctl show
bridge name bridge id       STP enabled interfaces
docker0     8000.0242fd0b0cf4   no      
gke-xxxxxxxxxx /home/uuuuuuu # 

Then I created a cluster using Virtualbox, I can see interfaces are associated with the bridge device.

[root@k8s-2 ~]# brctl show
bridge name bridge id       STP enabled interfaces
cni0        8000.36dae477639c   no      veth7f6c1f01
                                        vethccd0d71d
                                        vethe63e4285

What I am trying to reason about is why I cannot find the bridge device on Google VMs? Is there a special feature of Linux Kernel used in this scenario?

When I inspect each veth interface on Google VM, they all have the same ip address 10.188.2.1

gke-xxxxxxxxxxxxxxxxxxxxx /home/user.name # ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 169.254.123.1  netmask 255.255.255.0  broadcast 169.254.123.255
        ether 02:42:fd:0b:0c:f4  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1460
        inet 10.10.1.19  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::4001:aff:fe0a:113  prefixlen 64  scopeid 0x20<link>
        ether 42:01:0a:0a:01:13  txqueuelen 1000  (Ethernet)
        RX packets 2192921  bytes 1682211226 (1.5 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1288701  bytes 468627202 (446.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 276348  bytes 153128345 (146.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 276348  bytes 153128345 (146.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth27cee774: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1460
        inet 10.188.2.1  netmask 255.255.255.255  broadcast 10.188.2.1
        inet6 fe80::10b7:98ff:fe2f:2e08  prefixlen 64  scopeid 0x20<link>
        ether 12:b7:98:2f:2e:08  txqueuelen 0  (Ethernet)
        RX packets 32  bytes 2306 (2.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 10  bytes 710 (710.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth6eba4cdf: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1460
        inet 10.188.2.1  netmask 255.255.255.255  broadcast 10.188.2.1
        inet6 fe80::c4e3:b0ff:fe5f:63da  prefixlen 64  scopeid 0x20<link>
        ether c6:e3:b0:5f:63:da  txqueuelen 0  (Ethernet)
        RX packets 537091  bytes 138245354 (131.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 477870  bytes 122515885 (116.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth8bcf1494: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1460
        inet 10.188.2.1  netmask 255.255.255.255  broadcast 10.188.2.1
        inet6 fe80::70cb:c4ff:fe8c:a747  prefixlen 64  scopeid 0x20<link>
        ether 72:cb:c4:8c:a7:47  txqueuelen 0  (Ethernet)
        RX packets 50  bytes 3455 (3.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 28  bytes 2842 (2.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vethbb2135c7: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1460
        inet 10.188.2.1  netmask 255.255.255.255  broadcast 10.188.2.1
        inet6 fe80::1469:daff:fea0:8b5b  prefixlen 64  scopeid 0x20<link>
        ether 16:69:da:a0:8b:5b  txqueuelen 0  (Ethernet)
        RX packets 223995  bytes 82725559 (78.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 239258  bytes 60203574 (57.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vetheee4e8e3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1460
        inet 10.188.2.1  netmask 255.255.255.255  broadcast 10.188.2.1
        inet6 fe80::ec6c:3bff:fef3:70c2  prefixlen 64  scopeid 0x20<link>
        ether ee:6c:3b:f3:70:c2  txqueuelen 0  (Ethernet)
        RX packets 311669  bytes 40562747 (38.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 304461  bytes 628195110 (599.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

What is behind these veth interfaces?

Thanks in advance

vn flag
Doesn't the 2nd document you linked explain it? "With Calico, there is no L2 network bridge in the node, and instead, L3 routing is used for all traffic between pods"
JMC avatar
in flag
JMC
Thanks Mark. If you see my example, all interfaces are prefixed with veth instead of cali. My cluster doesn't use calico CNI.
Alex G avatar
ar flag
If you want to look at packets flowing from/to the pod cidr, specify cbr0, which is the Linux bridge that has all the veth interfaces connected to all the containers. Using `tcpdump -i cbr0` might help with the troubleshooting.
JMC avatar
in flag
JMC
@AlexG It doesn't look cbr0 exists. I can only see veth interfaces, but not the bridge device which is a mystery. Anyways, what I found out is cbr0 does exist if I use an older version. I tried 1.18-gke version. This version uses docker as container runtime and it has the cbr0 device. Starting 1.19.x-gke-xxxx, GKE nodes use containerd as the runtime and cbr0 device is no longer created.
Score:1
ar flag

If the bridge already has interfaces, The brctl show command can be used to view a node's bridge and interface details. It appears that you haven't introduced any interfaces to the bridge in your situation. You can add interfaces to the bridge with sudo brctl addif docker0 veth0, and you can receive all essential bridge and interface details in a node with the same command. Check this document for reference.

JMC avatar
in flag
JMC
With a fully working cluster, it is unnecessary to add the veth0 to the docker0. This is already handled when a new pod is created. As I mentioned in the above comment, 1.18-gke is the last version which still uses cbr0. I believe the routing of packets is done by routing rules. In 1.19.x, every pod IP has its own route table entry.
Alex G avatar
ar flag
Are there pods that has set `hostNetwork` to false on your GKE 1.19?
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.