Score:1

Hyper-V cluster-joined host chooses wrong NIC to send SMB traffic

ph flag

I have the following scenario:

Four Hyper-V host servers running Windows Server 2019, joined in a Hyper-V Failover Cluster. Each server has the same network config - a Management network (Cluster and Client communication configured), and a couple of other networks - for CSV, LM and iSCSI traffic, etc., all of which have only Cluster communication configured. Each network interface is properly in its own subnet/VLAN. The Management network interface has a gateway configured. The other networks do not, as they are used only for cluster traffic between the hosts themselves.

The hosts can see each other over all of the network interfaces. Everything is working well in production, cluster validation passes, cluster traffic is working, so are live migrations, etc.

What baffles me is I'm seeing constant SMB traffic (tcp/445) on our firewall, which resides behind the gateway of the Management interface. There is a constant stream of packets where each of the Hyper-V hosts tries to communicate via SMB from their Management network IP address as the source to addresses of all other hosts on the CSV network as destination. This SMB traffic is implicitly denied by the firewall, so there's no chance that any actual inter-host cluster traffic goes around and through the firewall.

The thing is, this traffic should not be visible by the firewall at all, as the servers need not go through the default gateway to access a network that is directly connected (on-link) to them. When I try to test communication manually with traceroute, everything is fine, packets do not not flow through the gateway.

It seems to me that Hyper-V for whatever kind of reason, chooses an invalid source interface and source IP (for a given destination IP of a host in the CSV network), instead of choosing as a source an interface from the very same directly connected network.

Since everything about Hyper-V cluster is working regularly, it is hard to diagnose this extra traffic we're seeing. Can anyone shed some light on this?

cn flag
Just a guess. When a host sends a packet it asks (ARP). May help to add static entry(s). I think Windows ARP cache life is small and random. Also cache *size* is small. Maybe sometimes it doesn't get a response and sends it elsewhere, which should return a proxy ARP redirect. If there were a problem such as this it would be difficult to test and detect. Windows default neighbor cache size limit is 1,024 entries. Tools that "scan" can flush the cache inadvertently. https://learn.microsoft.com/en-us/troubleshoot/windows-server/networking/address-resolution-protocol-arp-caching-behavior
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.