Score:4

ESXI 8.0 Host to NFS slow vs Guest to NFS

gn flag

I have been trying to track down why my backups have been slow using ghettoVCB from the ESXI host.

I'm currently backing up my virtuals using ghettoVCB to a NFS share on TrueNAS from the host OS.

When I copy files to the NFS share from a guest (Ubuntu 20.04) on the ESXI machine I get about 257MB/s (which is about right as I have a dedicated 2.5gb channel between the NAS and ESXI)

su@test:/mnt/guest$ time sh -c "dd if=/dev/zero of=test bs=1MB count=1024 && sync"
1024+0 records in
1024+0 records out
1024000000 bytes (1.0 GB, 977 MiB) copied, 3.98153 s, 257 MB/s

real    0m4.470s
user    0m0.002s
sys     0m0.619s
 
Guest NFS Mount Options:
rw, relatime, vers=4.2,
rsize=1048576, wsize=1048576,
namlen=255, hard, proto=tcp, timeo=600,
retrans=2, sec=sys, local_lock=none,

When I try to copy to the same NFS share from the ESXI host the throughput is much lower, working out at about 45MB/s:

/vmfs/volumes/9043e582-0376fe3e] time sh -c "dd if=/dev/zero of=./test bs=1MB count=1024 && sync"
1024+0 records in
1024+0 records out
real    0m 22.70s
user    0m 0.00s
sys     0m 0.00s

ESXI NFS Mount Options
Cant seem to find a way to see the mount options ESXI uses?

One thing I did note is that turning off sync on the ZFS data share on the server sped up ESXI writes to 146MB/s. Still a lot lower than the guest OS.

My assumption is that ESXI is being super safe and ensuring everything is synced 100%. Does anyone know if this is the case and does anyone has any tips on improving the performance for the backup?

Peter Zhabin avatar
cn flag
Testing ESXi performance with `dd` or file operations within ESXi shell does not really measure anything as this shell has some resource restrictions. Better way would be to install a VM with disks on NFS, give this VM a good deal of CPU/memory and do your `dd` tests within it on a local disk. Your results may vary dramatically.
Rtype avatar
gn flag
Yes, so the guest OS in my test above is running in a virtual on the same machine and writing to the NFS share and it does preform well 250MB/s). Due to how GhettoVCB works, I need to NFS speed to be decent at the host level. which is where the backup are taken and stored before being moved to the NFS store. As you say could possibly be that the esxi hypervisor OS is just not designed to get decent performance to NFS at the hypervisor level. But i would find that surprising as ESXI would have that same limitation when running virtual machines off the same NFS.
Peter Zhabin avatar
cn flag
You can try and mimic behavior of Veeam by attaching disks from machine to be backed up to another machine via CLI and then backup from there. But transfer speed limitations over network from within ESXi shell are old and long standing issue.
Rtype avatar
gn flag
Thanks. If you post that as an answer I will accept it.
Score:4
cn flag

What you see is absolutely normal and not fixable AS IS. VMware ESXi “by design” has no disk cache unlike guest OS inside a VM, that one really does! So, when you copy your file (which is a rogue test itself, you should be using more sophisticated benchmarks) from inside a VM you’re saturating your network as your pipelined sequential read is faster than network itself, but host ESXi has to read the data (slow, there’s no read ahead) into mmap()-ed shared storage / network memory buffers, initiate stateless NFS write, read disk again and on in the loop. If you’ll launch WireShark you’ll see guest VM Tx traffic is steady, and host OS does it with sorta spikes on Tx.

As a workaround you might want to get some caching RAID controller with a beefy on-board memory or throw in second node, build a cluster, configure vSAN (VMUG pricing is quite affordable for vSphere+vSAN). VMware vSAN will cache local disks at his level, below VMFS so you’ll saturate your 2.5Gb again.

Score:3
cn flag

As a workaround you might want to get some caching RAID controller with a beefy on-board memory or throw in second node, build a cluster, configure vSAN (VMUG pricing is quite affordable for vSphere+vSAN). VMware vSAN will cache local disks at his level, below VMFS so you’ll saturate your 2.5Gb again.

That is a worthwhile option. Alternatively, look forward Starwind VSAN which works on a block level and should give you better performance. It supports a mdamd raid that might also make sense to try.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.