On my network I have two servers.
Server1 is running TrueNas(BSD) with multiple applications running in iocage jails. It's connected to the network with a 3-nic LAGG.
Server2 is an OpenMediaVault (Debian) installation with multiple applications that run in docker containers connected with a single physical nic.
Both servers are connected to the same TP-Link managed switch.
I am trying to get a jail on Server1 which runs borgbackup to make http connections to the HealthChecks container on Server2. From the borgbackup jail's console I am unable to ping the IP address of Server2 or make curl requests to the docker container, even though it is up and works from everywhere else on the subnet, including from Server1 outside of the jail. If I then go to the console of Server2 and ping the IP of the jail, after missing (timeouts) 8-10 pings all communication is possible temporarily, which leads me to believe this is an ARP issue, but I'm unsure how to solve it.
Other possibly unrelated weirdness:
- The network is a /21 sized network, and both machines are in different /24 subnets (all nodes are configured as /21 so this shouldn't actually matter).
- I've noticed is that jails have never been able to get IP addresses from the DHCP server (I usually do DHCP reservation for all IP assignment on the network) and I've had to manually assign IPs for all of my jails on Server1.
- When the connection is not working pings from Server2 to the jail IP timeout. Pings from the jail to Server2 return ping: sendto: Host is down.
For the community bot: The main question is why does this communication channel fail after what I assume is the arp cache expiration, and how do I fix it?
EDIT: Disabling VNET on the jails seems to resolve the issue, but I was hoping to be able to keep VNET enabled.