Rancher Server Setup
- Rancher version: 2.6.3
- Installation option (Docker install/Helm Chart): Helm Chart, Kubernetes v1.21.6 and RKE1
Information about the Cluster
Kubernetes version: v1.20.15-rancher1-2
Cluster Type (Local/Downstream): Downstream
If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): RKE Custom (3 nodes on-prem + 1 node on Azure)
User Information
What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom) Admin role
Describe the bug
To illustrate the inter-pod communication problem, consider these three dcgm-exporter pods that collect and expose GPU metrics :
URL1- http://10.42.0.79:9400/metrics -> Pod 10.42.4.54 running on node-1-on-prem
URL2- http://10.42.2.77:9400/metrics -> Pod 10.42.2.77 running on node-2-on-prem
URL3- http://10.42.4.54:9400/metrics -> Pod 10.42.4.54 running on node-3-azure
On node-1-on-prem Linux shell :
curl URL1 & URL2 are successful; curl URL3 fails
On node-2-on-prem Linux shell :
curl URL1 & URL2 are successful; curl URL3 fails
On node-3-azure Linux shell :
curl URL1 & URL2 fail ; curl URL3 is successful
Reproduce
- On-prem subnet is 10.133.100.0/24 and Azure subnet is 10.208.2.0/24
- Azure Virtual network and Local network are connected by a site to site VPN
- Node to node connections are successful and there are no port restrictions in Azure and on-prem
- IPv4 port forwarding enabled on all nodes
- Downstream cluster container network interface configuration :
network: mtu: 0 options: flannel_backend_type: vxlan plugin: canal
- Azure node addition to cluster is flawless and all pods come up
Result
Expected Result
- Successful inter-pod communication and display of GPU metrics
How to get these pods to communicate properly?
Thanks in advance for your support.