Score:0

Erratic (terrible) Disk Performance I/O (Debian/Proxmox)

us flag

Summary

I'm seeing dramatically fluctuating I/O performance on a ZFS SSD mirror in Proxmox VE 7 (Bullseye). I'm simply too much of a novice to be able to track it down on my own.

Details

This is VERY noticeably poor performance in real-world tasks, so it's not just artificial benchmarks. But to help diagnose it I'm running:

sysbench fileio --file-test-mode=rndrw run

It's running "bare-metal" from the Proxmox terminal without any VM's active. The results vary wildly. Here are two examples:

File operations:
    reads/s:                      2316.07
    writes/s:                     1544.08
    fsyncs/s:                     4949.70

Throughput:
    read, MiB/s:                  36.19
    written, MiB/s:               24.13

General statistics:
    total time:                          10.0062s
    total number of events:              88040

Latency (ms):
         min:                                    0.00
         avg:                                    0.11
         max:                                   35.66
         95th percentile:                        0.65
         sum:                                 9947.54

Threads fairness:
    events (avg/stddev):           88040.0000/0.00
    execution time (avg/stddev):   9.9475/0.00

and

File operations:
    reads/s:                      22.60
    writes/s:                     15.07
    fsyncs/s:                     56.98

Throughput:
    read, MiB/s:                  0.35
    written, MiB/s:               0.24

General statistics:
    total time:                          10.6162s
    total number of events:              877

Latency (ms):
         min:                                    0.00
         avg:                                   11.43
         max:                                  340.62
         95th percentile:                       77.19
         sum:                                10020.19

Threads fairness:
    events (avg/stddev):           877.0000/0.00
    execution time (avg/stddev):   10.0202/0.00

As you see, there's a 10,000-fold swing in the total number of events and a massive increase in latency. These swings are not "one-off." It's constantly fluctuating between these kinds of extremes.

I've done my best to try to narrow down simple hardware issues. Both SSD's are brand new with all 100's in smartctl. I've swapped out SATA cables. I've run it with the mirror degraded to try to isolate a single drive problem. I've moved the drives to a separate SATA controller. Nothing gives me a different result.

I've got a second server configured in a similar fashion, though with older (and unmatched) SSD's in the mirror. Not seeing this issue. The server hardware differs, though. The poor results are from the system described below. The "normal" seeming results are from an old converted PC with an E3-1275v2.

What I'm hoping for are tips to help diagnose this issue. It seems that the problem is with latency. What can cause this? What next steps should I take?

Thanks in advance!

System (if it helps)

  • MB: Supermicro X9DRi-F
  • CPU: Dual Xeon E5-2650 v2
  • RAM: 128 GB (8 x 16GB)
  • SATA Controllers: Onboard SATA 3 (separate SATA 2 also tested)
  • SSD: 2x 1GB TeamGroup SATA (yeah, cheap, but should be fine)
  • PCIe Cards:
    • Mellanox MCX312B
    • LSI SAS9207-8i (HBA connected to 8 unmounted disks...passed through to VM)
    • Nvidia GTX 750 (passed through to VM)
Andrew Henle avatar
ph flag
*SSD: 2x 1GB TeamGroup SATA (yeah, cheap, but should be fine)* Are those the disks that are having the performance problems? If so, "should be fine" seems like wishful thinking...
us flag
@AndrewHenle Well, of course you could be right. Maybe the TLC is extra slow and the SLC cache is too small and the write-through is poorly designed and... But to my original question, how can I diagnose that?
br flag
I didn't know Team Group made pro SSD, I thought they only made consumer ones?
Score:0
md flag

zfs is copy-on-write fs. its very very bad for cheap SSDs. you can test this SSD with windows machine directly by secure erasing it and do a full write test (i think hd tune can do it) and you will see what perf this ssds have when they running out of SLC \ RAM... it will be very poor... like 50-70 mb/s instead of 500+ for sata. also some cheap ssd use system ram instead of own ram module or slc cache, and this is bad for zfs to. i hope this help had similar problem that solved by changing ssds to 980 pro (but smaller size cause of cost)

jm flag
An answer saying to use a Windows utility for an explicit Linux issue is out of bounds. There is no indication that the system is dual-boot so you're asking the OP to install a new OS that he might not be licensed for to diagnose the problem
DSighT avatar
md flag
you dont need license to install windows. you can use hd tune free trial or you can suggest similar tool your self... i have no idea what tool can fillup whole disk in linux with graph history like hd tune https://www.thessdreview.com/wp-content/uploads/2016/10/WD-Blue-SSD-1TB-HDTUNE.png
jm flag
Yes, you can install Windows without a license. It will work for [sixty days](https://answers.microsoft.com/en-us/windows/forum/all/windows-10-legal-without-activation/d7491774-4dd7-490a-862f-2c0d6febd0bd) but that doesn't mean that the installation is completely legitimate.
Score:0
ck flag

Something similar happened to me yesterday with my new setup: Proxmox VE 7 on a Intel 11500 with 32 GB of RAM and 2x SSD Crucial BX500 (these are consumer grade) configured as ZFS mirror.

I did a benchmark using dd to write 1GB of zeros to the SSD and it was running at 5 MB/s (I know dd it's not good for benchmarking but still...). During the benchmark, iostat was at 100% of utilization. In my case I solved the performance problem by trimming the SSDs. You can force it using zpool trim rpool and you can enable autotrim with zpool set autotrim=on rpool. After trimming the disks, I ran the benchmark again and it ran in 7s (153 MB/s).

During the process to find how to improve the performance, I set xattr to sa as it's recommended in the Proxmox wiki and I tuned the zfs_arc_min/zfs_arc_max and other kernel parameters.

I hope this works for you as well.

us flag
Thanks!!! I'll give this a shot and let you know.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.