I have a server running debian on top of a ZFS 3-way mirror of Exos X18 18TB (ST18000NM001J).
I'm benchmarking it and I'm finding some surprises for the read rate under certain conditions.
But first, for the benchmarking I created a benchmarking dataset (rpool/benchmarking) with primary and secondary cache set to none to avoid benchmarking the chache when reading, and also compression set to off to avoid inflated rates when writting arrays of 0's. Then I have created 3 subdatasets, named "8k", "128k" and "1M"; each one with its corresponding recordsize.
Then with the following dd script:
echo -e "bs=4M recordsize=1M\n"
dd if=/dev/zero of=/benchmarking/1M/ddfile bs=4M count=2000 conv=fdatasync
dd if=/benchmarking/1M/ddfile of=/dev/null bs=4M count=2000 #conv=fdatasync
rm /benchmarking/1M/ddfile
echo -e "------------------\n\n"
echo -e "bs=4k recordsize=1M\n"
dd if=/dev/zero of=/benchmarking/1M/ddfile bs=4k count=2000000 conv=fdatasync
dd if=/benchmarking/1M/ddfile of=/dev/null bs=4k count=2000000 #conv=fdatasync
rm /benchmarking/1M/ddfile
echo -e "------------------\n\n"
echo -e "bs=4M recordsize=128k\n"
dd if=/dev/zero of=/benchmarking/128k/ddfile bs=4M count=2000 conv=fdatasync
dd if=/benchmarking/128k/ddfile of=/dev/null bs=4M count=2000 #conv=fdatasync
rm /benchmarking/128k/ddfile
echo -e "------------------\n\n"
echo -e "bs=4k recordsize=128k\n"
dd if=/dev/zero of=/benchmarking/128k/ddfile bs=4k count=2000000 conv=fdatasync
dd if=/benchmarking/128k/ddfile of=/dev/null bs=4k count=2000000 #conv=fdatasync
rm /benchmarking/128k/ddfile
echo -e "------------------\n\n"
echo -e "bs=4M recordsize=8k\n"
dd if=/dev/zero of=/benchmarking/8k/ddfile bs=4M count=2000 conv=fdatasync
dd if=/benchmarking/8k/ddfile of=/dev/null bs=4M count=2000 #conv=fdatasync
rm /benchmarking/8k/ddfile
echo -e "------------------\n\n"
echo -e "bs=4k recordsize=8k\n"
dd if=/dev/zero of=/benchmarking/8k/ddfile bs=4k count=2000000 conv=fdatasync
dd if=/benchmarking/8k/ddfile of=/dev/null bs=4k count=2000000 #conv=fdatasync
rm /benchmarking/8k/ddfile
echo -e "------------------\n\n"
I got the following:
root@pbs:/benchmarking# ./dd_bench.sh
bs=4M recordsize=1M
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 43.3219 s, 194 MB/s
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 43.7647 s, 192 MB/s
------------------
bs=4k recordsize=1M
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 38.7432 s, 211 MB/s
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 5100.27 s, 1.6 MB/s
------------------
bs=4M recordsize=128k
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 60.1265 s, 140 MB/s
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 56.4249 s, 149 MB/s
------------------
bs=4k recordsize=128k
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 52.044 s, 157 MB/s
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 1242.29 s, 6.6 MB/s
------------------
bs=4M recordsize=8k
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 111.594 s, 75.2 MB/s
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 60.547 s, 139 MB/s
------------------
bs=4k recordsize=8k
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 96.3637 s, 85.0 MB/s
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 771.967 s, 10.6 MB/s
When the bloscksize it's small (4kb) the read speed it's very limited (between 1-10 MB/S). It's not happening the same for the write speed.
Then I have run bonnie++ for all three datasets:
root@pbs:~# bonnie++ -d /benchmarking/1M/ -u root -n 160
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 2.00 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
pbs 63624M 527k 93 136m 6 56.1m 4 0k 3 2902k 3 168.4 21
Latency 12952us 27977us 3500ms 21656ms 599ms 990ms
Version 2.00 ------Sequential Create------ --------Random Create--------
pbs -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
160 163840 13 +++++ +++ 163840 6 163840 15 +++++ +++ 163840 6
Latency 277ms 2447us 353ms 287ms 27us 377ms
1.98,2.00,pbs,1,1665552058,63624M,,8192,5,527,93,139633,6,57398,4,0,3,2902,3,168.4,21,160,,,,,9606,13,+++++,+++,1264,6,9808,15,+++++,+++,1147,6,12952us,27977us,3500ms,21656ms,599ms,990ms,277ms,2447us,353ms,287ms,27us,377ms
root@pbs:~# bonnie++ -d /benchmarking/128k/ -u root -n 160
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 2.00 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
pbs 63624M 525k 93 126m 6 44.1m 6 1k 7 10.3m 7 311.3 41
Latency 13067us 17678us 2688ms 6693ms 206ms 390ms
Version 2.00 ------Sequential Create------ --------Random Create--------
pbs -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
160 163840 13 +++++ +++ 163840 6 163840 14 +++++ +++ 163840 6
Latency 284ms 2643us 328ms 266ms 21us 356ms
1.98,2.00,pbs,1,1665335428,63624M,,8192,5,525,93,128601,6,45110,6,1,7,10548,7,311.3,41,160,,,,,8118,13,+++++,+++,1248,6,9634,14,+++++,+++,1173,6,13067us,17678us,2688ms,6693ms,206ms,390ms,284ms,2643us,328ms,266ms,21us,356ms
root@pbs:~# bonnie++ -d /benchmarking/8k/ -u root -n 160
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 2.00 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
pbs 63624M 528k 97 80.2m 6 54.4m 8 1k 4 15.1m 5 264.7 37
Latency 14231us 982us 1535ms 5087ms 342ms 284ms
Version 2.00 ------Sequential Create------ --------Random Create--------
pbs -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
160 163840 13 +++++ +++ 163840 6 163840 14 +++++ +++ 163840 6
Latency 334ms 100us 325ms 311ms 27us 353ms
1.98,2.00,pbs,1,1668749456,63624M,,8192,5,528,97,82088,6,55756,8,1,4,15510,5,264.7,37,160,,,,,9254,13,+++++,+++,1276,6,9582,14,+++++,+++,1066,6,14231us,982us,1535ms,5087ms,342ms,284ms,334ms,100us,325ms,311ms,27us,353ms
And it's returning very low read rates too, as dd. (3, 10 & 15MB/S)
As a final step I have run another dd bench this time aligning dd bs with zfs recordsize:
root@pbs:~# /benchmarking/dd_bench_2.sh
bs=1M recordsize=1M
8000+0 records in
8000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 62.6119 s, 134 MB/s
8000+0 records in
8000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 65.2772 s, 129 MB/s
------------------
bs=128k recordsize=128k
64000+0 records in
64000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 64.6437 s, 130 MB/s
64000+0 records in
64000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 49.128 s, 171 MB/s
------------------
bs=8k recordsize=8k
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 108.331 s, 75.6 MB/s
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 344.981 s, 23.7 MB/s
------------------
Now there it's an important improvement, but still, I expected a bigger read speed for 8k.
Then I set atime to off and repeated this last test but nothing changed too much.(1M dataset was already atime=off all the time, sorry for that).
root@pbs:~# /benchmarking/dd_bench_2.sh
bs=1M recordsize=1M
8000+0 records in
8000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 44.505 s, 188 MB/s
8000+0 records in
8000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 40.3689 s, 208 MB/s
------------------
bs=128k recordsize=128k
64000+0 records in
64000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 67.7169 s, 124 MB/s
64000+0 records in
64000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 56.0657 s, 150 MB/s
------------------
bs=8k recordsize=8k
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 103.724 s, 79.0 MB/s
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 343.753 s, 23.8 MB/s
So, trying to summarize:
- Why I have such slow read rates for bonnie++ and small bs dd?
- Read speed it's almost always equal or lower than write speed. How it could be like that in a 3-way mirror?? Where system can read from three devices at once but has to write 3x the data.
As extra info, the server it's running on enterprise grade disks but consumer grade (not low end, but consumer grade) mother board and the disks are connected to the motherboard sata controller. I know, they are low end sata controllers, but still, it's strange to have low read rates sometimes while write speeds are always nice.
Also, I have checked the drives are not SMR and I have repeated the tests done here in a similar server with similar hardware/setup obtaining similar results.
Finally, I attach the zfs get all from one of the benchmarking datasets:
root@pbs:~# zfs get all rpool/benchmarking/128k
NAME PROPERTY VALUE SOURCE
rpool/benchmarking/128k type filesystem -
rpool/benchmarking/128k creation Wed Oct 26 8:45 2022 -
rpool/benchmarking/128k used 96K -
rpool/benchmarking/128k available 12.5T -
rpool/benchmarking/128k referenced 96K -
rpool/benchmarking/128k compressratio 1.00x -
rpool/benchmarking/128k mounted yes -
rpool/benchmarking/128k quota none default
rpool/benchmarking/128k reservation none default
rpool/benchmarking/128k recordsize 128K default
rpool/benchmarking/128k mountpoint /benchmarking/128k inherited from rpool/benchmarking
rpool/benchmarking/128k sharenfs off default
rpool/benchmarking/128k checksum on default
rpool/benchmarking/128k compression off inherited from rpool/benchmarking
rpool/benchmarking/128k atime off local
rpool/benchmarking/128k devices on default
rpool/benchmarking/128k exec on default
rpool/benchmarking/128k setuid on default
rpool/benchmarking/128k readonly off default
rpool/benchmarking/128k zoned off default
rpool/benchmarking/128k snapdir hidden default
rpool/benchmarking/128k aclmode discard default
rpool/benchmarking/128k aclinherit restricted default
rpool/benchmarking/128k createtxg 255400 -
rpool/benchmarking/128k canmount on default
rpool/benchmarking/128k xattr on default
rpool/benchmarking/128k copies 1 default
rpool/benchmarking/128k version 5 -
rpool/benchmarking/128k utf8only off -
rpool/benchmarking/128k normalization none -
rpool/benchmarking/128k casesensitivity sensitive -
rpool/benchmarking/128k vscan off default
rpool/benchmarking/128k nbmand off default
rpool/benchmarking/128k sharesmb off default
rpool/benchmarking/128k refquota none default
rpool/benchmarking/128k refreservation none default
rpool/benchmarking/128k guid 13557460337392366562 -
rpool/benchmarking/128k primarycache none inherited from rpool/benchmarking
rpool/benchmarking/128k secondarycache none inherited from rpool/benchmarking
rpool/benchmarking/128k usedbysnapshots 0B -
rpool/benchmarking/128k usedbydataset 96K -
rpool/benchmarking/128k usedbychildren 0B -
rpool/benchmarking/128k usedbyrefreservation 0B -
rpool/benchmarking/128k logbias latency default
rpool/benchmarking/128k objsetid 60174 -
rpool/benchmarking/128k dedup off default
rpool/benchmarking/128k mlslabel none default
rpool/benchmarking/128k sync standard inherited from rpool
rpool/benchmarking/128k dnodesize legacy default
rpool/benchmarking/128k refcompressratio 1.00x -
rpool/benchmarking/128k written 96K -
rpool/benchmarking/128k logicalused 42K -
rpool/benchmarking/128k logicalreferenced 42K -
rpool/benchmarking/128k volmode default default
rpool/benchmarking/128k filesystem_limit none default
rpool/benchmarking/128k snapshot_limit none default
rpool/benchmarking/128k filesystem_count none default
rpool/benchmarking/128k snapshot_count none default
rpool/benchmarking/128k snapdev hidden default
rpool/benchmarking/128k acltype off default
rpool/benchmarking/128k context none default
rpool/benchmarking/128k fscontext none default
rpool/benchmarking/128k defcontext none default
rpool/benchmarking/128k rootcontext none default
rpool/benchmarking/128k relatime on inherited from rpool
rpool/benchmarking/128k redundant_metadata all default
rpool/benchmarking/128k overlay on default
rpool/benchmarking/128k encryption off default
rpool/benchmarking/128k keylocation none default
rpool/benchmarking/128k keyformat none default
rpool/benchmarking/128k pbkdf2iters 0 default
rpool/benchmarking/128k special_small_blocks 0 default
Thanks for your time!
EDIT: ashift is properly set to 12, dedup it's off and fragmentation it's 0%.