Score:-1

Very poor performance on Kubernetes with 100GbE network

ht flag

we are using ConnectX-5 100GbE ethernet cards on our servers which is connected one to each other trough the mellanox switch. And we are using weavenet cni plugin on our Kubernetes cluster. When we make some tests using iperf tool with the following command we get the 100Gbps connection speed in the host.

# server host
host1 $ iperf -s -P8
# client host
host2 $ iperf -c <host_ip> -P8
Result: 98.8 Gbps transfer speed

Also when we make some tests with the same tool and command using two docker containers on the same hosts we also get the same results.

# server host
host1$ docker run -it -p 5001:5001 ubuntu:latest-with-iperf iperf -s -P8 
# client host
host2 $ docker run -it -p 5001:5001 ubuntu:latest-with-iperf iperf -c <host_ip> -P8
Result: 98.8 Gbps transfer speed

But the when we create two diffrent deployment in the same hosts(host1,host2) with the same images and make the same test trough the service ip(we created a k8s service using the following yaml) which redirects traffic into the server pod we get the only 2Gbps. We also make the same test using the pod's cluster ip and the service's cluster domain but the results are same.

kubectl create deployment iperf-server --image=ubuntu:latest-with-iperf  # after that we add affinity(host1) and container port sections to the yaml
kubectl create deployment iperf-client --image=ubuntu:latest-with-iperf  # after that we add affinity(host2) and container port sections to the yaml
kind: Service
apiVersion: v1
metadata:
  name: iperf-server
  namespace: default
spec:
  ports:
    - name: iperf
      protocol: TCP
      port: 5001
      targetPort: 5001
  selector:
    name: iperf-server
  clusterIP: 10.104.10.230
  type: ClusterIP
  sessionAffinity: None

TLDR; The scenarios we tested:

  • host1(ubuntu 20.04, mellanox driver installed) <--------> host2(ubuntu 20.04, mellanox driver installed) = 98.8 Gbps
  • container1-on-host1 <--------> container2-on-host2 = 98.8 Gbps
  • Pod1-on-host1 <-------> Pod2-on-host2 (using cluster ip) = 2Gbps
  • Pod1-on-host1 <-------> Pod2-on-host2 (using service cluster ip) = 2Gbps
  • Pod1-on-host1 <-------> Pod2-on-host2 (using service cluster domain) = 2Gbps

We need to get the 100Gbps speed on pod-to-pod communication. So what could be causing this issue?

Update1:

  • When I check the htop inside pods during the iperf test there are 112 cpu core and none of them are struggling with CPU.
  • When I add the hostNetwork: true key to the deployments pods can reach up to 100Gbps bandwith.
ng flag
If you do `htop` with 'detailed CPU statistics' on, can you see much CPU? I' thinking 'system' or 'softirq'. Another wild guess is that perhaps there is a stateful NAT layer in between? That can cause CPU issues, but is more of a problem with many connections, not packets. BTW: you can edit your question with the details, as opposed to replying by comment).
Zekeriya Akgül avatar
ht flag
@Halfgaar thanks for your reply. I edited my question. We don't have any custom iptables rules on the hosts. The iptables is fully managed by the weavenet.
ng flag
I don't know about K8S, but Docker for instance manipulates iptables, so you may have some inadvertently? But, you say you fixed it, with `hostNetwork: true`?
Score:2
ht flag

We figure this out by disabling the encryption on weavenet. But rebooting the server did the trick. Thanks for this article.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.