For some reason, QD2 seems to be the best in terms of raw IOPS...
After seeing an interesting article comparing a couple different drives for SLOG usage, I got curious about the performance of my own system at different queue depths.
The system comprises an i9-13900K, 128GB DDR5-4800 system mem and two Samsung 980 PROs as single mirrored Vdev. The test is running FIO in the following config in a container on ProxMox 7:
fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=<4k/8k> --numjobs=1 --size=8g --iodepth=<1/2/4/8> --runtime=30 --time_based
Something odd I noticed was that during async writes, the best performance would consistently occur at queue depths of 2. I tried several different compression, block size and record size combinations but QD2 consistently had the best IOPS scores... I would have expected it to get better at higher queue depths or am I missing something? (I wonder if it's related to the drives having only two NAND chips per drive?)
No specific ZFS tuning has been performed other than setting ashift=12 and what's listed above.
Does anyone know why ZFS async writes seem to work best at QD2 compared to QD1, QD4 or QD8?
And now... Graphs! These are the averaged results of the 216 different combinations I tried.
(why a record size of 16k was the worst is beyond me)
(A block size of 8k still results in more overall bandwidth)

Bit surprised to see Zstd beating LZ4. Perhaps I'm reaching the IO limits of these poor SSDs? The actual SSD activity was usually hitting 400-700MB/s during testing.
N.B. During all tests the CPU governor was set to "Performance", when set to "Powersave" the IOPS figures were 30-50% worse!