Score:2

Extremely slow MySQL writes in Ubuntu (ext4) guest on Promox 7.3-3 (ZFS) host

cn flag
  1. I'm trying out Proxmox 7.3-3 as a hypervisor and managed to install it using ZFS on my NVMe (Samsung_SSD_970_EVO_Plus_2TB_S6S2NS0T505403V)
  2. I also installed a Ubuntu 22.04 VM and its filesystem is ext4
  3. Next, I installed MySQL server (8.0.32)
  4. I normally don't tune MySQL, its performance is acceptable as-is
  5. However, it took 3 minutes to ingest a 20MB uncompressed SQL file using mysql < ...
  6. This is much slower than when the same Ubuntu guest ran on Windows 10 VirtualBox (a 20MB ingest will have taken less than 30 seconds usually)

Any idea what I might be missing here?

UPDATE 1

iostat from guest, I'm seeing 100% %util in red, why would it be 100?

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
dm-0             0.00      0.00     0.00   0.00    0.00     0.00 1448.00   6792.00     0.00   0.00    0.67     4.69    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.98 100.00
loop0            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop1            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
loop2            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sda              0.00      0.00     0.00   0.00    0.00     0.00 1075.00   6792.00   373.00  25.76    0.93     6.32    0.00      0.00     0.00   0.00    0.00     0.00  492.00    1.91    1.94 100.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sr0              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

iostat from proxmox, also 100%

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.44    0.00    0.75    5.77    0.00   92.04

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme0n1          0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme1n1        484.00      0.00     0.00   0.00    1.77     0.00  814.00  20328.00     0.00   0.00    0.02    24.97    0.00      0.00     0.00   0.00    0.00     0.00  484.00    1.77    1.74 100.00
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdd              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sde              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdf              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdg              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
zd0              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
zd16             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
zd32             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
zd48             0.00      0.00     0.00   0.00    0.00     0.00  737.00   4916.00     0.00   0.00    0.00     6.67    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00 100.00
zd64             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
zd80             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

proxmox's zpool iostat -v -l 1

                                     capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim
pool                               alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait
---------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
rpool                               208G  1.61T     18    495  76.0K  8.00M   98us   25us   98us   25us  394ns  374ns      -      -      -      -
  nvme-eui.0025385521403c96-part3   208G  1.61T     18    495  76.0K  8.00M   98us   25us   98us   25us  394ns  374ns      -      -      -      -
---------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----

UPDATE 2

root@pve:~# zfs get all | grep "sync\|logbias"
rpool                     logbias               latency                default
rpool                     sync                  standard               local
rpool/ROOT                logbias               latency                default
rpool/ROOT                sync                  standard               inherited from rpool
rpool/ROOT/pve-1          logbias               latency                default
rpool/ROOT/pve-1          sync                  standard               inherited from rpool
rpool/data                logbias               latency                default
rpool/data                sync                  standard               inherited from rpool
rpool/data/vm-100-disk-0  logbias               latency                default
rpool/data/vm-100-disk-0  sync                  standard               inherited from rpool
rpool/data/vm-100-disk-1  logbias               latency                default
rpool/data/vm-100-disk-1  sync                  standard               inherited from rpool
rpool/data/vm-101-disk-0  logbias               latency                default
rpool/data/vm-101-disk-0  sync                  standard               inherited from rpool
rpool/data/vm-101-disk-1  logbias               latency                default
rpool/data/vm-101-disk-1  sync                  standard               inherited from rpool
rpool/data/vm-102-disk-0  logbias               latency                default
rpool/data/vm-102-disk-0  sync                  standard               inherited from rpool
rpool/data/vm-102-disk-1  logbias               latency                default
rpool/data/vm-102-disk-1  sync                  standard               inherited from rpool
root@pve:~#
shodanshok avatar
ca flag
Please add the output of `iostat -x -k 1` from *both* the guest and the host systems when importing your SQL file. For the host system, please also add the output of `zpool iostat -v -l 1` during the same workload. Moreover, is your guest image installed onto a zvol or a normal zfs dataset?
bilogic avatar
cn flag
pardon me, but I'm really new to this, how do I tell if it is a zvol or zfs? for `iostat`, it is an on-going output, i'm not sure how to present it properly, but if your concern is that my system is busy, I doubt so as I was monitoring it via the proxmox dashboard and the I/O was less than 15% load, nonetheless, i'm going to try to capture these outputs
bilogic avatar
cn flag
I do know that i'm using zfs-thin, could it be the issue here?
shodanshok avatar
ca flag
proxmox uses zvol by default, so lets pretend your VM really is on a zvols for the moment.
bilogic avatar
cn flag
@shodanshok updated, let's hope we can get this mystery solved! thank you!
ua flag
Is that SQL all `INSERTs`? What engine (InnoDB)? What is the value of `innodb_buffer_pool_size` on each machine? Are `TEXT` or `BLOB` columns involved? Just looking at I/O may not give you the desired answer.
shodanshok avatar
ca flag
Your disk is maxed out by sync/flush requests. What NVMe disk model do you have? Please also add the output of `zfs get all | grep "sync\|logbias"` taken from your host machine.
bilogic avatar
cn flag
@shodanshok added the new logs
bilogic avatar
cn flag
@shodanshok not gigabyte, I was confused. It is a `Samsung_SSD_970_EVO_Plus_2TB_S6S2NS0T505403V` 2TB
ua flag
Are the `INSERTs` one row at a time? Or batched (lots of rows in each `INSERT`)?
bilogic avatar
cn flag
Rick, thanks for the help. I'm quite sure it's one row at a time and it is something I can live with since it was generated by a tool. But the answer lied in tweaking ZFS to match my SSD characteristics.
Score:1
ca flag

Based on the output of iostat, we can see your disk struggling with sync/flush requests. This depends on the consumer-grade nature of your disk, which lacks any powerloss-protected writeback cache. As you can see, this means that even a disk rated for up to 560K random write iops really maxes out at ~500 fsync/s.

Ok, how can you speed up your server? Some options, ordered by preference:

  • if losing up to 5s of data is not a concern for this particular Proxmox server, you can disable sync pass-through at ZFS level by running the following command on your host: zfs set sync=disabled rpool. This will speed up all virtual machines running on this server but, again, remember you can lose data (up to 5s by default). However, due to ZFS being a CoW filesystem, no general filesystem corruption is possible even in the face of setting sync=disabled

  • if this is not acceptable, you can tune mysql to avoid flushing at each INSERT, rather issuing fsync() only every 1s. The option you need is innodb_flush_log_at_trx_commit = 2

  • you can set Proxmox (or, better, qemu) itself for ignoring fsync() issued against a specific virtual disk by setting the image cache option as unsafe

bilogic avatar
cn flag
Since I have a UPS, I decided to test with `zfs set sync=disabled rpool` first, brought the whole operation down to 13 seconds. Once again, thank you!
bilogic avatar
cn flag
could you fill me in if I'm using zvol or a normal zfs dataset? thanks!
shodanshok avatar
ca flag
You should be using volumes rather than dataset to store vm images. To be sure, you can run `zfs list -o name,type` on your host machine.
bilogic avatar
cn flag
thanks, it says `volume` for all the `rpool/data/vm-*`
bilogic avatar
cn flag
I like to ask if my SSD is "suitable" for the current ZFS `sync=disabled` scenario. My concern is, is PVE + ZFS making any excessive writes that could kill my SSD faster vs when I was using W10 + NTFS to run basically the same services at the same load. Thank you.
shodanshok avatar
ca flag
If anything, `sync=disabled` *lowers* SSD wear by the virtue of coalescing multiple small writes in bigger ones, limiting write amplification at NAND level.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.