I have a server running debian on top of a ZFS 3-way mirror of Exos X18 18TB (ST18000NM001J).
I'm benchmarking it and I'm finding some surprises for the read rate under certain conditions.
But first, for the benchmarking I created a benchmarking dataset (rpool/benchmarking) with primary and secondary cache set to none to avoid benchmarking the chache when reading, and also compression set to off to avoid inflated rates when writting arrays of 0's. Then I have created 3 subdatasets, named "8k", "128k" and "1M"; each one with its corresponding recordsize.
Then with the following dd script:
echo -e "bs=4M recordsize=1M\n"
dd if=/dev/zero of=/benchmarking/1M/ddfile bs=4M count=2000 conv=fdatasync
dd if=/benchmarking/1M/ddfile of=/dev/null bs=4M count=2000 #conv=fdatasync
rm /benchmarking/1M/ddfile
echo -e "------------------\n\n"
echo -e "bs=4k recordsize=1M\n"
dd if=/dev/zero of=/benchmarking/1M/ddfile bs=4k count=2000000 conv=fdatasync
dd if=/benchmarking/1M/ddfile of=/dev/null bs=4k count=2000000 #conv=fdatasync
rm /benchmarking/1M/ddfile
echo -e "------------------\n\n"
echo -e "bs=4M recordsize=128k\n"
dd if=/dev/zero of=/benchmarking/128k/ddfile bs=4M count=2000 conv=fdatasync
dd if=/benchmarking/128k/ddfile of=/dev/null bs=4M count=2000 #conv=fdatasync
rm /benchmarking/128k/ddfile
echo -e "------------------\n\n"
echo -e "bs=4k recordsize=128k\n"
dd if=/dev/zero of=/benchmarking/128k/ddfile bs=4k count=2000000 conv=fdatasync
dd if=/benchmarking/128k/ddfile of=/dev/null bs=4k count=2000000 #conv=fdatasync
rm /benchmarking/128k/ddfile
echo -e "------------------\n\n"
echo -e "bs=4M recordsize=8k\n"
dd if=/dev/zero of=/benchmarking/8k/ddfile bs=4M count=2000 conv=fdatasync
dd if=/benchmarking/8k/ddfile of=/dev/null bs=4M count=2000 #conv=fdatasync
rm /benchmarking/8k/ddfile
echo -e "------------------\n\n"
echo -e "bs=4k recordsize=8k\n"
dd if=/dev/zero of=/benchmarking/8k/ddfile bs=4k count=2000000 conv=fdatasync
dd if=/benchmarking/8k/ddfile of=/dev/null bs=4k count=2000000 #conv=fdatasync
rm /benchmarking/8k/ddfile
echo -e "------------------\n\n"
I got the following:
root@pbs:/benchmarking# ./dd_bench.sh
bs=4M recordsize=1M
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 43.3219 s, 194 MB/s
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 43.7647 s, 192 MB/s
------------------
bs=4k recordsize=1M
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 38.7432 s, 211 MB/s
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 5100.27 s, 1.6 MB/s
------------------
bs=4M recordsize=128k
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 60.1265 s, 140 MB/s
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 56.4249 s, 149 MB/s
------------------
bs=4k recordsize=128k
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 52.044 s, 157 MB/s
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 1242.29 s, 6.6 MB/s
------------------
bs=4M recordsize=8k
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 111.594 s, 75.2 MB/s
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 60.547 s, 139 MB/s
------------------
bs=4k recordsize=8k
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 96.3637 s, 85.0 MB/s
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 771.967 s, 10.6 MB/s
When the bloscksize it's small (4kb) the read speed it's very limited (between 1-10 MB/S). It's not happening the same for the write speed.
Then I have run bonnie++ for all three datasets:
root@pbs:~# bonnie++ -d /benchmarking/1M/ -u root -n 160
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  2.00       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
pbs          63624M  527k  93  136m   6 56.1m   4    0k   3 2902k   3 168.4  21
Latency             12952us   27977us    3500ms   21656ms     599ms     990ms
Version  2.00       ------Sequential Create------ --------Random Create--------
pbs                 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                160 163840  13 +++++ +++ 163840   6 163840  15 +++++ +++ 163840   6
Latency               277ms    2447us     353ms     287ms      27us     377ms
1.98,2.00,pbs,1,1665552058,63624M,,8192,5,527,93,139633,6,57398,4,0,3,2902,3,168.4,21,160,,,,,9606,13,+++++,+++,1264,6,9808,15,+++++,+++,1147,6,12952us,27977us,3500ms,21656ms,599ms,990ms,277ms,2447us,353ms,287ms,27us,377ms
root@pbs:~# bonnie++ -d /benchmarking/128k/ -u root -n 160
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  2.00       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
pbs          63624M  525k  93  126m   6 44.1m   6    1k   7 10.3m   7 311.3  41
Latency             13067us   17678us    2688ms    6693ms     206ms     390ms
Version  2.00       ------Sequential Create------ --------Random Create--------
pbs                 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                160 163840  13 +++++ +++ 163840   6 163840  14 +++++ +++ 163840   6
Latency               284ms    2643us     328ms     266ms      21us     356ms
1.98,2.00,pbs,1,1665335428,63624M,,8192,5,525,93,128601,6,45110,6,1,7,10548,7,311.3,41,160,,,,,8118,13,+++++,+++,1248,6,9634,14,+++++,+++,1173,6,13067us,17678us,2688ms,6693ms,206ms,390ms,284ms,2643us,328ms,266ms,21us,356ms
root@pbs:~# bonnie++ -d /benchmarking/8k/ -u root -n 160
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  2.00       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
pbs          63624M  528k  97 80.2m   6 54.4m   8    1k   4 15.1m   5 264.7  37
Latency             14231us     982us    1535ms    5087ms     342ms     284ms
Version  2.00       ------Sequential Create------ --------Random Create--------
pbs                 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                160 163840  13 +++++ +++ 163840   6 163840  14 +++++ +++ 163840   6
Latency               334ms     100us     325ms     311ms      27us     353ms
1.98,2.00,pbs,1,1668749456,63624M,,8192,5,528,97,82088,6,55756,8,1,4,15510,5,264.7,37,160,,,,,9254,13,+++++,+++,1276,6,9582,14,+++++,+++,1066,6,14231us,982us,1535ms,5087ms,342ms,284ms,334ms,100us,325ms,311ms,27us,353ms
And it's returning very low read rates too, as dd. (3, 10 & 15MB/S)
As a final step I have run another dd bench this time aligning dd bs with zfs recordsize:
root@pbs:~# /benchmarking/dd_bench_2.sh
bs=1M recordsize=1M
8000+0 records in
8000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 62.6119 s, 134 MB/s
8000+0 records in
8000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 65.2772 s, 129 MB/s
------------------
bs=128k recordsize=128k
64000+0 records in
64000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 64.6437 s, 130 MB/s
64000+0 records in
64000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 49.128 s, 171 MB/s
------------------
bs=8k recordsize=8k
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 108.331 s, 75.6 MB/s
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 344.981 s, 23.7 MB/s
------------------
Now there it's an important improvement, but still, I expected a bigger read speed for 8k.
Then I set atime to off and repeated this last test but nothing changed too much.(1M dataset was already atime=off all the time, sorry for that).
root@pbs:~# /benchmarking/dd_bench_2.sh
bs=1M recordsize=1M
8000+0 records in
8000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 44.505 s, 188 MB/s
8000+0 records in
8000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 40.3689 s, 208 MB/s
------------------
bs=128k recordsize=128k
64000+0 records in
64000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 67.7169 s, 124 MB/s
64000+0 records in
64000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 56.0657 s, 150 MB/s
------------------
bs=8k recordsize=8k
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 103.724 s, 79.0 MB/s
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 343.753 s, 23.8 MB/s
So, trying to summarize:
- Why I have such slow read rates for bonnie++ and small bs dd?
- Read speed it's almost always equal or lower than write speed. How it could be like that in a 3-way mirror?? Where system can read from three devices at once but has to write 3x the data.
As extra info, the server it's running on enterprise grade disks but consumer grade (not low end, but consumer grade) mother board and the disks are connected to the motherboard sata controller. I know, they are low end sata controllers, but still, it's strange to have low read rates sometimes while write speeds are always nice.
Also, I have checked the drives are not SMR and I have repeated the tests done here in a similar server with similar hardware/setup obtaining similar results.
Finally, I attach the zfs get all from one of the benchmarking datasets:
root@pbs:~# zfs get all rpool/benchmarking/128k
NAME                     PROPERTY              VALUE                  SOURCE
rpool/benchmarking/128k  type                  filesystem             -
rpool/benchmarking/128k  creation              Wed Oct 26  8:45 2022  -
rpool/benchmarking/128k  used                  96K                    -
rpool/benchmarking/128k  available             12.5T                  -
rpool/benchmarking/128k  referenced            96K                    -
rpool/benchmarking/128k  compressratio         1.00x                  -
rpool/benchmarking/128k  mounted               yes                    -
rpool/benchmarking/128k  quota                 none                   default
rpool/benchmarking/128k  reservation           none                   default
rpool/benchmarking/128k  recordsize            128K                   default
rpool/benchmarking/128k  mountpoint            /benchmarking/128k     inherited from rpool/benchmarking
rpool/benchmarking/128k  sharenfs              off                    default
rpool/benchmarking/128k  checksum              on                     default
rpool/benchmarking/128k  compression           off                    inherited from rpool/benchmarking
rpool/benchmarking/128k  atime                 off                    local
rpool/benchmarking/128k  devices               on                     default
rpool/benchmarking/128k  exec                  on                     default
rpool/benchmarking/128k  setuid                on                     default
rpool/benchmarking/128k  readonly              off                    default
rpool/benchmarking/128k  zoned                 off                    default
rpool/benchmarking/128k  snapdir               hidden                 default
rpool/benchmarking/128k  aclmode               discard                default
rpool/benchmarking/128k  aclinherit            restricted             default
rpool/benchmarking/128k  createtxg             255400                 -
rpool/benchmarking/128k  canmount              on                     default
rpool/benchmarking/128k  xattr                 on                     default
rpool/benchmarking/128k  copies                1                      default
rpool/benchmarking/128k  version               5                      -
rpool/benchmarking/128k  utf8only              off                    -
rpool/benchmarking/128k  normalization         none                   -
rpool/benchmarking/128k  casesensitivity       sensitive              -
rpool/benchmarking/128k  vscan                 off                    default
rpool/benchmarking/128k  nbmand                off                    default
rpool/benchmarking/128k  sharesmb              off                    default
rpool/benchmarking/128k  refquota              none                   default
rpool/benchmarking/128k  refreservation        none                   default
rpool/benchmarking/128k  guid                  13557460337392366562   -
rpool/benchmarking/128k  primarycache          none                   inherited from rpool/benchmarking
rpool/benchmarking/128k  secondarycache        none                   inherited from rpool/benchmarking
rpool/benchmarking/128k  usedbysnapshots       0B                     -
rpool/benchmarking/128k  usedbydataset         96K                    -
rpool/benchmarking/128k  usedbychildren        0B                     -
rpool/benchmarking/128k  usedbyrefreservation  0B                     -
rpool/benchmarking/128k  logbias               latency                default
rpool/benchmarking/128k  objsetid              60174                  -
rpool/benchmarking/128k  dedup                 off                    default
rpool/benchmarking/128k  mlslabel              none                   default
rpool/benchmarking/128k  sync                  standard               inherited from rpool
rpool/benchmarking/128k  dnodesize             legacy                 default
rpool/benchmarking/128k  refcompressratio      1.00x                  -
rpool/benchmarking/128k  written               96K                    -
rpool/benchmarking/128k  logicalused           42K                    -
rpool/benchmarking/128k  logicalreferenced     42K                    -
rpool/benchmarking/128k  volmode               default                default
rpool/benchmarking/128k  filesystem_limit      none                   default
rpool/benchmarking/128k  snapshot_limit        none                   default
rpool/benchmarking/128k  filesystem_count      none                   default
rpool/benchmarking/128k  snapshot_count        none                   default
rpool/benchmarking/128k  snapdev               hidden                 default
rpool/benchmarking/128k  acltype               off                    default
rpool/benchmarking/128k  context               none                   default
rpool/benchmarking/128k  fscontext             none                   default
rpool/benchmarking/128k  defcontext            none                   default
rpool/benchmarking/128k  rootcontext           none                   default
rpool/benchmarking/128k  relatime              on                     inherited from rpool
rpool/benchmarking/128k  redundant_metadata    all                    default
rpool/benchmarking/128k  overlay               on                     default
rpool/benchmarking/128k  encryption            off                    default
rpool/benchmarking/128k  keylocation           none                   default
rpool/benchmarking/128k  keyformat             none                   default
rpool/benchmarking/128k  pbkdf2iters           0                      default
rpool/benchmarking/128k  special_small_blocks  0                      default
Thanks for your time!
EDIT: ashift is properly set to 12, dedup it's off and fragmentation it's 0%.