Score:3

slow performance on ssd when using kvm to launch vm's

us flag

On my host hardware I have 1G's speed

enter image description here

on my vm I create using kvm it drops to around 20MB's

My host is running ubuntu 22.04 LTS

enter image description here

How can I optimise this?

I am using file based VM's. I created disk type raw and qcow2 only difference I saw was the creation of the file disk when specifying.

I tried setting nocache on the disk via virt-manager

This is the device information enter image description here

I also checked cache mode none/writeback no difference to speed

Here are some further tests I have performed:

Single 4KiB random write process: worst possible test to perform

Host hardware

$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.28
Starting 1 process
random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [w(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=114493: Tue Jan 24 12:42:44 2023
  write: IOPS=10.1k, BW=39.5MiB/s (41.5MB/s)(4096MiB/103604msec); 0 zone resets
    slat (nsec): min=1920, max=587633, avg=3761.73, stdev=3026.96
    clat (usec): min=11, max=2551.6k, avg=26.49, stdev=2593.73
     lat (usec): min=13, max=2551.7k, avg=30.25, stdev=2593.74
    clat percentiles (usec):
     |  1.00th=[   20],  5.00th=[   22], 10.00th=[   22], 20.00th=[   22],
     | 30.00th=[   22], 40.00th=[   23], 50.00th=[   23], 60.00th=[   23],
     | 70.00th=[   23], 80.00th=[   24], 90.00th=[   25], 95.00th=[   26],
     | 99.00th=[   32], 99.50th=[   34], 99.90th=[   44], 99.95th=[  165],
     | 99.99th=[  545]
   bw (  KiB/s): min=24864, max=152592, per=100.00%, avg=135295.44, stdev=25421.57, samples=62
   iops        : min= 6216, max=38148, avg=33823.85, stdev=6355.39, samples=62
  lat (usec)   : 20=1.13%, 50=98.80%, 100=0.01%, 250=0.05%, 500=0.01%
  lat (usec)   : 750=0.02%
  lat (msec)   : 2=0.01%, 500=0.01%, 750=0.01%, >=2000=0.01%
  cpu          : usr=5.71%, sys=7.64%, ctx=1063940, majf=0, minf=366
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1048577,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=39.5MiB/s (41.5MB/s), 39.5MiB/s-39.5MiB/s (41.5MB/s-41.5MB/s), io=4096MiB (4295MB), run=103604-103604msec

Disk stats (read/write):
    dm-0: ios=0/240696, merge=0/0, ticks=0/16578288, in_queue=16578288, util=85.10%, aggrios=0/242596, aggrmerge=0/3006, aggrticks=0/20300771, aggrin_queue=20300770, aggrutil=89.20%
  sda: ios=0/242596, merge=0/3006, ticks=0/20300771, in_queue=20300770, util=89.20%


$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=114600: Tue Jan 24 12:45:29 2023
  write: IOPS=11.2k, BW=43.7MiB/s (45.8MB/s)(4096MiB/93810msec); 0 zone resets
    slat (nsec): min=1800, max=637861, avg=3705.65, stdev=2443.65
    clat (usec): min=10, max=582234, avg=22.74, stdev=706.46
     lat (usec): min=12, max=582238, avg=26.45, stdev=706.47
    clat percentiles (usec):
     |  1.00th=[   17],  5.00th=[   20], 10.00th=[   21], 20.00th=[   21],
     | 30.00th=[   21], 40.00th=[   21], 50.00th=[   22], 60.00th=[   22],
     | 70.00th=[   22], 80.00th=[   22], 90.00th=[   24], 95.00th=[   25],
     | 99.00th=[   31], 99.50th=[   33], 99.90th=[   44], 99.95th=[  151],
     | 99.99th=[  537]
   bw (  KiB/s): min=44784, max=185360, per=100.00%, avg=147168.42, stdev=18660.88, samples=57
   iops        : min=11196, max=46340, avg=36792.07, stdev=4665.22, samples=57
  lat (usec)   : 20=6.13%, 50=93.79%, 100=0.01%, 250=0.05%, 500=0.01%
  lat (usec)   : 750=0.02%
  lat (msec)   : 2=0.01%, 250=0.01%, 500=0.01%, 750=0.01%
  cpu          : usr=6.33%, sys=7.47%, ctx=1079749, majf=0, minf=327
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1048577,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=43.7MiB/s (45.8MB/s), 43.7MiB/s-43.7MiB/s (45.8MB/s-45.8MB/s), io=4096MiB (4295MB), run=93810-93810msec

Disk stats (read/write):
    dm-0: ios=0/257987, merge=0/0, ticks=0/14471372, in_queue=14471372, util=80.94%, aggrios=0/259380, aggrmerge=0/3269, aggrticks=0/20576252, aggrin_queue=20576252, aggrutil=88.06%
  sda: ios=0/259380, merge=0/3269, ticks=0/20576252, in_queue=20576252, util=88.06%


$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=114700: Tue Jan 24 12:48:03 2023
  write: IOPS=10.5k, BW=41.0MiB/s (43.0MB/s)(4096MiB/99783msec); 0 zone resets
    slat (nsec): min=1931, max=543062, avg=3706.35, stdev=3369.72
    clat (usec): min=11, max=659263, avg=22.63, stdev=643.97
     lat (usec): min=14, max=659267, avg=26.33, stdev=643.98
    clat percentiles (usec):
     |  1.00th=[   19],  5.00th=[   21], 10.00th=[   21], 20.00th=[   21],
     | 30.00th=[   22], 40.00th=[   22], 50.00th=[   22], 60.00th=[   22],
     | 70.00th=[   22], 80.00th=[   23], 90.00th=[   24], 95.00th=[   25],
     | 99.00th=[   29], 99.50th=[   33], 99.90th=[   43], 99.95th=[  139],
     | 99.99th=[  537]
   bw (  KiB/s): min= 5648, max=166179, per=100.00%, avg=144625.43, stdev=22760.25, samples=58
   iops        : min= 1412, max=41544, avg=36156.28, stdev=5690.11, samples=58
  lat (usec)   : 20=3.87%, 50=96.05%, 100=0.01%, 250=0.05%, 500=0.01%
  lat (usec)   : 750=0.02%, 1000=0.01%
  lat (msec)   : 20=0.01%, 750=0.01%
  cpu          : usr=5.86%, sys=7.61%, ctx=1080511, majf=0, minf=359
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1048577,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=41.0MiB/s (43.0MB/s), 41.0MiB/s-41.0MiB/s (43.0MB/s-43.0MB/s), io=4096MiB (4295MB), run=99783-99783msec

Disk stats (read/write):
    dm-0: ios=0/245070, merge=0/0, ticks=0/17235960, in_queue=17235960, util=83.79%, aggrios=0/246419, aggrmerge=0/3660, aggrticks=0/22057670, aggrin_queue=22057670, aggrutil=88.55%
  sda: ios=0/246419, merge=0/3660, ticks=0/22057670, in_queue=22057670, util=88.55%

This test on the VM running openstack (controller2) with 3x single bare VM in openstack running no apps running on kvm

$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.28
Starting 1 process
random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]                         
random-write: (groupid=0, jobs=1): err= 0: pid=451129: Tue Jan 24 13:04:09 2023
  write: IOPS=250, BW=1001KiB/s (1026kB/s)(826MiB/844616msec); 0 zone resets
    slat (nsec): min=604, max=487941, avg=3069.50, stdev=3227.61
    clat (usec): min=2, max=116745k, avg=576.78, stdev=253872.83
     lat (usec): min=9, max=116745k, avg=579.85, stdev=253872.85
    clat percentiles (usec):
     |  1.00th=[   11],  5.00th=[   13], 10.00th=[   14], 20.00th=[   15],
     | 30.00th=[   15], 40.00th=[   19], 50.00th=[   22], 60.00th=[   24],
     | 70.00th=[   26], 80.00th=[   31], 90.00th=[   40], 95.00th=[   49],
     | 99.00th=[   76], 99.50th=[   91], 99.90th=[  359], 99.95th=[  685],
     | 99.99th=[  873]
   bw (  KiB/s): min=13680, max=195824, per=100.00%, avg=130092.46, stdev=52846.56, samples=13
   iops        : min= 3420, max=48956, avg=32523.08, stdev=13211.60, samples=13
  lat (usec)   : 4=0.01%, 10=0.96%, 20=46.60%, 50=48.11%, 100=3.99%
  lat (usec)   : 250=0.23%, 500=0.03%, 750=0.06%, 1000=0.02%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, >=2000=0.01%
  cpu          : usr=0.10%, sys=0.13%, ctx=264372, majf=0, minf=29
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,211466,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=1001KiB/s (1026kB/s), 1001KiB/s-1001KiB/s (1026kB/s-1026kB/s), io=826MiB (866MB), run=844616-844616msec

Disk stats (read/write):
    dm-0: ios=232/163901, merge=0/0, ticks=144/7660152, in_queue=7660296, util=17.91%, aggrios=221/160213, aggrmerge=11/3722, aggrticks=159/1113901, aggrin_queue=1983749, aggrutil=43.00%
  vda: ios=221/160213, merge=11/3722, ticks=159/1113901, in_queue=1983749, util=43.00%


$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=452551: Tue Jan 24 13:25:06 2023
  write: IOPS=286, BW=1145KiB/s (1172kB/s)(973MiB/869962msec); 0 zone resets
    slat (nsec): min=1014, max=520262, avg=3532.80, stdev=4003.56
    clat (nsec): min=910, max=57218M, avg=259432.63, stdev=114674189.43
     lat (usec): min=13, max=57218k, avg=262.97, stdev=114674.22
    clat percentiles (usec):
     |  1.00th=[   14],  5.00th=[   16], 10.00th=[   18], 20.00th=[   19],
     | 30.00th=[   21], 40.00th=[   22], 50.00th=[   23], 60.00th=[   24],
     | 70.00th=[   27], 80.00th=[   29], 90.00th=[   34], 95.00th=[   42],
     | 99.00th=[   70], 99.50th=[   77], 99.90th=[  172], 99.95th=[  502],
     | 99.99th=[22676]
   bw (  KiB/s): min= 5336, max=161784, per=100.00%, avg=110630.83, stdev=54549.81, samples=18
   iops        : min= 1334, max=40446, avg=27657.67, stdev=13637.43, samples=18
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 20=28.78%, 50=68.68%, 100=2.30%
  lat (usec)   : 250=0.17%, 500=0.02%, 750=0.02%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 20=0.01%, 50=0.01%, >=2000=0.01%
  cpu          : usr=0.13%, sys=0.17%, ctx=260439, majf=0, minf=30
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,248968,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=1145KiB/s (1172kB/s), 1145KiB/s-1145KiB/s (1172kB/s-1172kB/s), io=973MiB (1020MB), run=869962-869962msec

Disk stats (read/write):
    dm-0: ios=124/189939, merge=0/0, ticks=64/6847936, in_queue=6848000, util=72.81%, aggrios=79/179513, aggrmerge=45/10455, aggrticks=26/1126630, aggrin_queue=2028077, aggrutil=90.71%
  vda: ios=79/179513, merge=45/10455, ticks=26/1126630, in_queue=2028077, util=90.71%

You can see from this it goes from 43MB/s to 1MB/s. This is a huge problem

This test is on Openstack VM controller2 but the virtualisation software is ESXi

$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.28
Starting 1 process
random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=530128: Tue Jan 24 13:18:47 2023
  write: IOPS=3149, BW=12.3MiB/s (12.9MB/s)(1722MiB/139918msec); 0 zone resets
    slat (nsec): min=1385, max=749909, avg=11219.59, stdev=9674.52
    clat (nsec): min=610, max=149012k, avg=122940.18, stdev=866525.51
     lat (usec): min=35, max=149020, avg=134.16, stdev=866.28
    clat percentiles (usec):
     |  1.00th=[   35],  5.00th=[   35], 10.00th=[   46], 20.00th=[   51],
     | 30.00th=[   60], 40.00th=[   63], 50.00th=[   64], 60.00th=[   68],
     | 70.00th=[   70], 80.00th=[   72], 90.00th=[   79], 95.00th=[   89],
     | 99.00th=[  221], 99.50th=[ 1467], 99.90th=[13829], 99.95th=[16188],
     | 99.99th=[19530]
   bw (  KiB/s): min= 9672, max=99544, per=100.00%, avg=29553.08, stdev=21110.49, samples=119
   iops        : min= 2418, max=24886, avg=7388.23, stdev=5277.64, samples=119
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 50=17.22%, 100=79.51%
  lat (usec)   : 250=2.37%, 500=0.14%, 750=0.08%, 1000=0.06%
  lat (msec)   : 2=0.12%, 4=0.01%, 10=0.30%, 20=0.18%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=3.14%, sys=6.60%, ctx=564104, majf=0, minf=30
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,440722,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=12.3MiB/s (12.9MB/s), 12.3MiB/s-12.3MiB/s (12.9MB/s-12.9MB/s), io=1722MiB (1805MB), run=139918-139918msec

Disk stats (read/write):
    dm-0: ios=0/240336, merge=0/0, ticks=0/3124100, in_queue=3124100, util=91.31%, aggrios=0/235436, aggrmerge=0/5071, aggrticks=0/2887407, aggrin_queue=2887407, aggrutil=92.02%
  sda: ios=0/235436, merge=0/5071, ticks=0/2887407, in_queue=2887407, util=92.02%



$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=530294: Tue Jan 24 13:21:08 2023
  write: IOPS=6080, BW=23.8MiB/s (24.9MB/s)(2393MiB/100740msec); 0 zone resets
    slat (nsec): min=1367, max=1029.8k, avg=11761.38, stdev=10525.79
    clat (nsec): min=915, max=62359k, avg=82333.89, stdev=390799.49
     lat (usec): min=35, max=62382, avg=94.10, stdev=391.00
    clat percentiles (usec):
     |  1.00th=[   36],  5.00th=[   37], 10.00th=[   47], 20.00th=[   59],
     | 30.00th=[   65], 40.00th=[   67], 50.00th=[   69], 60.00th=[   71],
     | 70.00th=[   72], 80.00th=[   74], 90.00th=[   82], 95.00th=[   98],
     | 99.00th=[  192], 99.50th=[  253], 99.90th=[ 8356], 99.95th=[ 9372],
     | 99.99th=[16057]
   bw (  KiB/s): min=23136, max=95208, per=100.00%, avg=41702.67, stdev=13481.11, samples=117
   iops        : min= 5784, max=23802, avg=10425.62, stdev=3370.29, samples=117
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 50=13.24%, 100=82.03%, 250=4.21%
  lat (usec)   : 500=0.22%, 750=0.10%, 1000=0.02%
  lat (msec)   : 2=0.06%, 4=0.01%, 10=0.06%, 20=0.05%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=6.24%, sys=13.79%, ctx=755651, majf=0, minf=29
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,612557,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=23.8MiB/s (24.9MB/s), 23.8MiB/s-23.8MiB/s (24.9MB/s-24.9MB/s), io=2393MiB (2509MB), run=100740-100740msec

Disk stats (read/write):
    dm-0: ios=0/353311, merge=0/0, ticks=0/2510080, in_queue=2510080, util=93.10%, aggrios=0/325545, aggrmerge=0/28769, aggrticks=0/2168746, aggrin_queue=2168746, aggrutil=93.35%
  sda: ios=0/325545, merge=0/28769, ticks=0/2168746, in_queue=2168746, util=93.35%



$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=530405: Tue Jan 24 13:23:08 2023
  write: IOPS=5930, BW=23.2MiB/s (24.3MB/s)(2308MiB/99631msec); 0 zone resets
    slat (nsec): min=1378, max=1395.4k, avg=12724.69, stdev=10859.25
    clat (nsec): min=797, max=22413k, avg=83620.52, stdev=356081.74
     lat (usec): min=35, max=22415, avg=96.35, stdev=356.19
    clat percentiles (usec):
     |  1.00th=[   36],  5.00th=[   48], 10.00th=[   57], 20.00th=[   65],
     | 30.00th=[   69], 40.00th=[   71], 50.00th=[   71], 60.00th=[   72],
     | 70.00th=[   73], 80.00th=[   76], 90.00th=[   81], 95.00th=[   93],
     | 99.00th=[  184], 99.50th=[  219], 99.90th=[ 8291], 99.95th=[10290],
     | 99.99th=[14091]
   bw (  KiB/s): min=26568, max=100256, per=100.00%, avg=40559.51, stdev=9507.31, samples=116
   iops        : min= 6642, max=25064, avg=10139.87, stdev=2376.85, samples=116
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 50=6.37%, 100=89.89%, 250=3.36%
  lat (usec)   : 500=0.15%, 750=0.09%, 1000=0.02%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.04%, 20=0.06%, 50=0.01%
  cpu          : usr=6.64%, sys=14.57%, ctx=711625, majf=0, minf=28
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,590890,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=23.2MiB/s (24.3MB/s), 23.2MiB/s-23.2MiB/s (24.3MB/s-24.3MB/s), io=2308MiB (2420MB), run=99631-99631msec

Disk stats (read/write):
    dm-0: ios=0/302542, merge=0/0, ticks=0/2060836, in_queue=2060836, util=83.71%, aggrios=0/302903, aggrmerge=0/388, aggrticks=0/1961686, aggrin_queue=1961686, aggrutil=83.91%
  sda: ios=0/302903, merge=0/388, ticks=0/1961686, in_queue=1961686, util=83.91%

I have Samsung SSD 870 QVO 2TB, total of 4TB running raid 0

Here is my xml from kvm

https://pastebin.com/NNGqMRtV

Jaromanda X avatar
ru flag
when you created the storage device for the VM, what device type and bus type did you use?
shorif2000 avatar
us flag
I used VirtIO, see picture
diya avatar
la flag
Please use copy-paste and avoid posting screenshots of text when posting console output / settings. Format that text as "`code`" using [Markdown](http://serverfault.com/editing-help) and/or the formatting options in the edit menu to properly type-set your posts. That improves readability, attracts better answers and allows indexing by search engines, which may help people with similar questions.
cn flag
Is that a server grade SSD or the usual end user crap that has good read speed, bad write speed and uses a buffer to hide that for small operations - which you do not do? The numbers look awfully low - all of them - for a proper ssd based storage array.
shorif2000 avatar
us flag
I have Samsung SSD 870 QVO 2TB
Tom Yan avatar
in flag
I wonder if it has something to do with the type of filesystem you use on the host sidr to store the vm disk images.
shorif2000 avatar
us flag
everything is ext4
shodanshok avatar
ca flag
Are you using RAID on the host? If yes, which level / controller / etc ?
shodanshok avatar
ca flag
Your `dd` test shows 1 GB/s speed, while your SATA disk can only provide ~500 MB/s max data transfer, so something seems wrong. Anyway, try adding `oflag=direct` to both your host and guest tests and report back.
Nikita Kipriyanov avatar
za flag
Isn't this just another manifestation of the [problem that I encountered](https://serverfault.com/questions/1002138/bad-linux-storage-performance-in-comparison-with-windows-on-the-same-machine)?
Score:1
ar flag

I believe the Samsung 870 is a consumer level drive which will degrade and have a high likely hood of correlated failure, especially in a multi-node cluster where you are most probably running Ceph.

The 2 TB version of the following model (it's the 7.6 TB version) would be a better option: SAMSUNG MZ7LH7T6HMLA-00005

Pay careful attention to sector alignment, most operating systems create partitions on even 1 MiB boundaries and start the first partition on sector 2048 (considering an emulated 512 byte sector size).

In the below example I switch the display units to sectors. The print out also reveals that the emulated (logical) sector size is 512 bytes per sector whilst the drive lays out data in 4 KiB pages (physical sector size). Parted also has a built-in command to check sector alignment of partitions:

[root@kvm1a ~]# parted /dev/sda
GNU Parted 3.4
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) unit s
(parted) p
Model: ATA SAMSUNG MZ7LH7T6 (scsi)
Disk /dev/sda: 15002931888s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start      End           Size          File system  Name        Flags
 1      2048s      4095s         2048s                      bbp         bios_grub
 2      4096s      62918655s     62914560s                  non-fs      raid
 3      62918656s  65015807s     2097152s                   non-fs      raid
 4      65015808s  65220607s     204800s       xfs          ceph data
 5      65220608s  15002929151s  14937708544s               ceph block

(parted) align-check optimal 1
1 aligned
(parted) align-check optimal 2
2 aligned
(parted) align-check optimal 3
3 aligned
(parted) align-check optimal 4
4 aligned
(parted) align-check optimal 5
5 aligned

Partitions should essentially start at sector 2048, this yields a clean 1 MiB starting boundary as 2048 x 512 (sector size) = 1048576 (1 MiB). Many people presume that this wastes space and try to create partitions starting on sector 1. This however causes problems as the first addressable sector is actually 0, not 1. Sector 0 is reserved for the MBR/GPT partition table and boot jump code.

In case anyone finds this useful, herewith a script which validates the starting sector of partitions on all Ceph RBD mapped images on a compute node:

  rbd showmapped | grep /dev/rbd | awk '{print $3" "$5}' | while read disk dev; do
    parted --script $dev 'unit s p'| grep -P '^\s+\d' | while read partition start info; do
      num=${start::-1};
      if [ $num != $((num/2048*2048)) ]; then
        [ `echo $info | grep -c 'Microsoft reserved partition'` -lt 1 ] && \
        [ `grep -Pc "\s131072\s+${dev#/dev/}$" /proc/partitions` -lt 1 ] && \
        echo "$disk mounted as $dev has problem with partition $partition";
      fi
    done
  done
  # 2048 comes from 1024*1024/512 = 2048
  # excludes spacer partitions created by Windows
  # excludes MikroTik CHR disks of 128 MiB
Score:0
ar flag

With KVM we found that hosts running multiple VMs would provide the best performance when qemu was configured to use vioscsi with writeback caching.

Some benchmarks show higher read performance with caching disabled, due to the system not copying the rewards to the cache, but this greatly benefits reads and reduces strain on Ceph/iSCSI storage.

PS: writeback mode is flush aware, so it works like any other well behaved hardware RAID controller and is transactionally safe.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.