Score:3

Linux mdraid resync speed on 36 drive array

id flag

I have some performance issue with mdraid. I have one 18x10TB soft raid6 array that is resyncing at ~70MB/s:

kernel 5.8.13-1.el8
/dev/md0:
           Version : 1.2
     Creation Time : Mon Oct  5 15:11:15 2020
        Raid Level : raid6
        Array Size : 155136221184 (144.48 TiB 158.86 TB)
     Used Dev Size : 9696013824 (9.03 TiB 9.93 TB)
      Raid Devices : 18
     Total Devices : 18
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Wed Aug 18 18:35:42 2021
             State : clean, degraded, resyncing
    Active Devices : 17
   Working Devices : 18
    Failed Devices : 0
     Spare Devices : 1

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

     Resync Status : 25% complete

              Name : large2:0  (local to host large2)
              UUID : bdb63778:b3765982:b257478b:70121351
            Events : 500678

    Number   Major   Minor   RaidDevice State
      18       8        5        0      active sync   /dev/sda5
       1       8       21        1      active sync   /dev/sdb5
       2       8       34        2      active sync   /dev/sdc2
       3       8       50        3      active sync   /dev/sdd2
       4       8       66        4      active sync   /dev/sde2
       5       8       82        5      active sync   /dev/sdf2
       -       0        0        6      removed
       7       8      114        7      active sync   /dev/sdh2
       8       8      130        8      active sync   /dev/sdi2
       9       8      146        9      active sync   /dev/sdj2
      10       8      162       10      active sync   /dev/sdk2
      11       8      178       11      active sync   /dev/sdl2
      12       8      194       12      active sync   /dev/sdm2
      13       8      210       13      active sync   /dev/sdn2
      14       8      226       14      active sync   /dev/sdo2
      15       8      242       15      active sync   /dev/sdp2
      16      65        2       16      active sync   /dev/sdq2
      17      65       18       17      active sync   /dev/sdr2

       6       8       98        -      spare   /dev/sdg2
md0 : active raid6 sdg2[6](S) sda5[18] sdr2[17] sdq2[16] sdp2[15] sdo2[14] sdn2[13] sdm2[12] sdl2[11] sdk2[10] sdj2[9] sdi2[8] sdh2[7] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb5[1]
      155136221184 blocks super 1.2 level 6, 512k chunk, algorithm 2 [18/17] [UUUUUU_UUUUUUUUUUU]
      [=====>...............]  resync = 25.0% (2431109212/9696013824) finish=1562.2min speed=77503K/sec
      bitmap: 9/73 pages [36KB], 65536KB chunk

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sdg              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sdb            322.00    2.80  77282.40      6.60 16471.00     0.80  98.08  22.22  412.53  126.79 133.40   240.01     2.36   2.53  82.10
sdf            339.40    0.00  77758.40      0.00 16454.40     0.00  97.98   0.00  379.81    0.00 128.91   229.11     0.00   2.39  81.28
sdr            325.00    0.00  77414.40      0.00 16465.20     0.00  98.06   0.00  405.64    0.00 131.83   238.20     0.00   2.40  77.90
sdm            329.00    0.00  78477.60      0.00 16465.40     0.00  98.04   0.00  398.25    0.00 131.02   238.53     0.00   2.38  78.36
sdi            328.60    0.00  77084.00      0.00 16460.40     0.00  98.04   0.00  391.20    0.00 128.55   234.58     0.00   2.48  81.64
sdh            335.40    0.00  77753.60      0.00 16456.20     0.00  98.00   0.00  389.88    0.00 130.77   231.82     0.00   2.42  81.14
sdj            326.40    0.00  77700.80      0.00 16464.80     0.00  98.06   0.00  408.07    0.00 133.19   238.05     0.00   2.48  80.90
sde            328.60    0.00  77700.80      0.00 16462.60     0.00  98.04   0.00  398.74    0.00 131.03   236.46     0.00   2.46  80.92
sdn            332.00    0.00  77050.40      0.00 16456.60     0.00  98.02   0.00  382.56    0.00 127.01   232.08     0.00   2.35  78.12
sdl            324.80    0.00  76341.60      0.00 16461.20     0.00  98.07   0.00  385.14    0.00 125.09   235.04     0.00   2.40  78.00
sdp            326.60    0.00  76789.60      0.00 16461.00     0.00  98.05   0.00  393.01    0.00 128.36   235.12     0.00   2.38  77.76
sdq            325.00    0.00  77281.60      0.00 16464.60     0.00  98.06   0.00  404.60    0.00 131.49   237.79     0.00   2.40  77.94
sdk            331.80    0.00  77685.60      0.00 16459.40     0.00  98.02   0.00  386.56    0.00 128.26   234.13     0.00   2.34  77.48
sda            324.20    2.80  77067.20      6.60 16464.40     0.80  98.07  22.22  426.02  135.93 138.74   237.71     2.36   2.59  84.62
sdd            327.60    0.00  77276.00      0.00 16461.80     0.00  98.05   0.00  401.83    0.00 131.64   235.89     0.00   2.47  81.02
sdc            330.80    0.00  77605.60      0.00 16460.00     0.00  98.03   0.00  396.68    0.00 131.22   234.60     0.00   2.46  81.24
sdo            326.40    0.00  77927.20      0.00 16465.60     0.00  98.06   0.00  402.73    0.00 131.45   238.75     0.00   2.40  78.38
md0              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

.

and second 36x14TB soft raid6 that is doing initial resync at ~40MB/s:

.

kernel 5.13.11-1.el8
           Version : 1.2
     Creation Time : Tue Aug 17 09:37:39 2021
        Raid Level : raid6
        Array Size : 464838634496 (432.91 TiB 475.99 TB)
     Used Dev Size : 13671724544 (12.73 TiB 14.00 TB)
      Raid Devices : 36
     Total Devices : 36
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Wed Aug 18 16:39:11 2021
             State : active, resyncing
    Active Devices : 36
   Working Devices : 36
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

     Resync Status : 32% complete

              Name : large1:0  (local to host large1)
              UUID : b7cace22:832e570f:eba39768:bb1a1ed6
            Events : 20709

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       8       49        1      active sync   /dev/sdd1
       2       8       65        2      active sync   /dev/sde1
       3       8       81        3      active sync   /dev/sdf1
       4       8       97        4      active sync   /dev/sdg1
       5       8      113        5      active sync   /dev/sdh1
       6       8      129        6      active sync   /dev/sdi1
       7       8      145        7      active sync   /dev/sdj1
       8       8      161        8      active sync   /dev/sdk1
       9       8      209        9      active sync   /dev/sdn1
      10       8      177       10      active sync   /dev/sdl1
      11       8      225       11      active sync   /dev/sdo1
      12       8      241       12      active sync   /dev/sdp1
      13      65        1       13      active sync   /dev/sdq1
      14      65       17       14      active sync   /dev/sdr1
      15       8      193       15      active sync   /dev/sdm1
      16      65      145       16      active sync   /dev/sdz1
      17      65      161       17      active sync   /dev/sdaa1
      18      65       33       18      active sync   /dev/sds1
      19      65       49       19      active sync   /dev/sdt1
      20      65       65       20      active sync   /dev/sdu1
      21      65       81       21      active sync   /dev/sdv1
      22      65       97       22      active sync   /dev/sdw1
      23      65      113       23      active sync   /dev/sdx1
      24      65      129       24      active sync   /dev/sdy1
      25      65      177       25      active sync   /dev/sdab1
      26      65      193       26      active sync   /dev/sdac1
      27      65      209       27      active sync   /dev/sdad1
      28      65      225       28      active sync   /dev/sdae1
      29      65      241       29      active sync   /dev/sdaf1
      30      66        1       30      active sync   /dev/sdag1
      31      66       17       31      active sync   /dev/sdah1
      32      66       33       32      active sync   /dev/sdai1
      33      66       49       33      active sync   /dev/sdaj1
      34      66       65       34      active sync   /dev/sdak1
      35      66       81       35      active sync   /dev/sdal1
md0 : active raid6 sdal1[35] sdak1[34] sdaj1[33] sdah1[31] sdai1[32] sdag1[30] sdaf1[29] sdac1[26] sdae1[28] sdab1[25] sdad1[27] sds1[18] sdq1[13] sdz1[16] sdo1[11] sdp1[12] sdx1[23] sdr1[14] sdw1[22] sdn1[9] sdaa1[17] sdv1[21] sdu1[20] sdy1[24] sdt1[19] sdk1[8] sdm1[15] sdl1[10] sdh1[5] sdj1[7] sdf1[3] sdi1[6] sdc1[0] sdg1[4] sde1[2] sdd1[1]
      464838634496 blocks super 1.2 level 6, 512k chunk, algorithm 2 [36/36] [UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU]
      [======>..............]  resync = 32.4% (4433869056/13671724544) finish=3954.9min speed=38929K/sec
      bitmap: 70/102 pages [280KB], 65536KB chunk
Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sdc           9738.60    1.40  38956.00      5.80     0.40     0.40   0.00  22.22    0.20    9.29   1.93     4.00     4.14   0.07  71.82
sdd           9738.20    1.00  38952.80      2.60     0.00     0.00   0.00   0.00    0.89    5.80   8.68     4.00     2.60   0.07  71.60
sde           9738.60    1.40  38956.00      5.80     0.40     0.40   0.00  22.22    0.31    3.71   3.02     4.00     4.14   0.07  70.60
sdf           9738.40    1.00  38953.60      2.60     0.00     0.00   0.00   0.00    0.17    3.20   1.69     4.00     2.60   0.07  70.56
sdg           9738.40    1.00  38953.60      2.60     0.00     0.00   0.00   0.00    0.85    4.20   8.31     4.00     2.60   0.07  70.72
sdh           9738.40    1.00  38953.60      2.60     0.00     0.00   0.00   0.00    0.20    4.00   1.93     4.00     2.60   0.07  70.64
sdi           9738.60    1.00  38954.40      2.60     0.00     0.00   0.00   0.00    0.17    8.20   1.70     4.00     2.60   0.07  70.98
sdj           9714.60    1.00  38954.40      2.60    24.00     0.00   0.25   0.00    0.58    4.00   5.61     4.01     2.60   0.07  70.66
sdk           9677.00    1.00  38953.60      2.60    61.40     0.00   0.63   0.00    1.23    4.40  11.94     4.03     2.60   0.07  70.76
sdl           9738.40    1.00  38953.60      2.60     0.00     0.00   0.00   0.00    0.15    5.80   1.44     4.00     2.60   0.07  70.76
sdm           9738.40    1.00  38953.60      2.60     0.00     0.00   0.00   0.00    0.38    2.80   3.73     4.00     2.60   0.07  70.96
sdo           9705.60    1.00  38953.60      2.60    32.80     0.00   0.34   0.00    0.83    5.80   8.07     4.01     2.60   0.07  70.80
sdp           9738.40    1.00  38953.60      2.60     0.00     0.00   0.00   0.00    0.30    4.20   2.91     4.00     2.60   0.07  70.60
sdn           9738.40    1.00  38953.60      2.60     0.00     0.00   0.00   0.00    0.34    5.60   3.30     4.00     2.60   0.07  70.76
sdt           9659.80    1.00  38954.40      2.60    78.80     0.00   0.81   0.00    1.00    4.00   9.71     4.03     2.60   0.07  70.44
sds           9640.40    1.00  38954.40      2.60    98.20     0.00   1.01   0.00    1.29    5.60  12.42     4.04     2.60   0.07  70.60
sdq           9738.40    1.00  38953.60      2.60     0.00     0.00   0.00   0.00    0.30    4.40   2.92     4.00     2.60   0.07  70.68
sdu           9738.60    1.00  38954.40      2.60     0.00     0.00   0.00   0.00    0.13    4.40   1.31     4.00     2.60   0.07  70.66
sdv           9696.20    1.00  38954.40      2.60    42.40     0.00   0.44   0.00    1.30    4.20  12.57     4.02     2.60   0.07  70.76
sdw           9738.40    1.00  38953.60      2.60     0.00     0.00   0.00   0.00    0.94    4.20   9.13     4.00     2.60   0.07  70.70
sdy           9738.40    1.00  38953.60      2.60     0.00     0.00   0.00   0.00    0.11    4.40   1.05     4.00     2.60   0.07  70.62
sdr           9730.80    1.00  38953.60      2.60     7.60     0.00   0.08   0.00    1.22    4.20  11.87     4.00     2.60   0.07  70.68
sdx           9718.00    1.00  38954.40      2.60    20.60     0.00   0.21   0.00    0.88    4.20   8.57     4.01     2.60   0.07  70.70
sdaa          9738.40    1.00  38953.60      2.60     0.00     0.00   0.00   0.00    0.24    4.20   2.38     4.00     2.60   0.07  70.60
sdz           9738.40    1.00  38953.60      2.60     0.00     0.00   0.00   0.00    0.20    4.20   1.91     4.00     2.60   0.07  70.60
sdab          9633.60    1.00  38953.60      2.60   104.80     0.00   1.08   0.00    1.38    4.20  13.33     4.04     2.60   0.07  70.52
sdac          9639.20    1.00  38954.40      2.60    99.40     0.00   1.02   0.00    1.08    5.60  10.45     4.04     2.60   0.07  70.56
sdad          9536.20    1.00  38954.40      2.60   202.40     0.00   2.08   0.00    2.73    4.00  26.04     4.08     2.60   0.07  70.36
sdaf          9738.60    1.00  38954.40      2.60     0.00     0.00   0.00   0.00    0.37    4.00   3.63     4.00     2.60   0.07  70.64
sdae          9738.60    1.00  38954.40      2.60     0.00     0.00   0.00   0.00    0.16    5.40   1.61     4.00     2.60   0.07  70.72
sdag          9735.20    1.00  38940.80      2.60     0.00     0.00   0.00   0.00    0.46    5.80   4.48     4.00     2.60   0.07  70.76
sdai          9738.60    1.00  38954.40      2.60     0.00     0.00   0.00   0.00    0.31    4.00   3.01     4.00     2.60   0.07  70.60
sdah          9661.60    1.00  38955.20      2.60    77.00     0.00   0.79   0.00    1.51    4.20  14.57     4.03     2.60   0.07  70.70
sdal          9739.20    1.40  38958.40      5.80     0.40     0.40   0.00  22.22    0.27    4.86   2.65     4.00     4.14   0.07  70.80
sdaj          9738.60    1.00  38954.40      2.60     0.00     0.00   0.00   0.00    0.17    4.40   1.68     4.00     2.60   0.07  70.64
sdak          9738.80    1.00  38955.20      2.60     0.00     0.00   0.00   0.00    0.53    5.40   5.21     4.00     2.60   0.07  70.80

Both arrays are running on systems with 32+ cores, 64GB RAM+ with no other load.

Both arrays have stripe_cache_size = 32768.

md0_raid6 process is using 50-75% cpu in both servers.

Each drive in both arrays has >100MB/s sequential read when tested by fio.

10TB hdds is 18 drives array are: TOSHIBA MG06ACA10TE

# blockdev --report
RO    RA   SSZ   BSZ   StartSec            Size   Device
rw  8192   512  4096          0  10000831348736   /dev/sdg
rw  8192   512  4096       2048         1048576   /dev/sdg1
rw  8192   512   512       4096  10000829234688   /dev/sdg2
rw  8192   512  4096          0  10000831348736   /dev/sdb
rw  8192   512  4096       2048         1048576   /dev/sdb1
rw  8192   512  4096       4096     17179869184   /dev/sdb2
rw  8192   512  4096   33558528      1074790400   /dev/sdb3
rw  8192   512  4096   35657728     53720645632   /dev/sdb4
rw  8192   512   512  140580864   9928853929472   /dev/sdb5
rw  8192   512  4096          0  10000831348736   /dev/sdf
rw  8192   512  4096       2048         1048576   /dev/sdf1
rw  8192   512   512       4096  10000829234688   /dev/sdf2
rw  8192   512  4096          0  10000831348736   /dev/sdr
rw  8192   512  4096       2048         1048576   /dev/sdr1
rw  8192   512   512       4096  10000829234688   /dev/sdr2
rw  8192   512  4096          0  10000831348736   /dev/sdm
rw  8192   512  4096       2048         1048576   /dev/sdm1
rw  8192   512   512       4096  10000829234688   /dev/sdm2
rw  8192   512  4096          0  10000831348736   /dev/sdi
rw  8192   512  4096       2048         1048576   /dev/sdi1
rw  8192   512   512       4096  10000829234688   /dev/sdi2
rw  8192   512  4096          0  10000831348736   /dev/sdh
rw  8192   512  4096       2048         1048576   /dev/sdh1
rw  8192   512   512       4096  10000829234688   /dev/sdh2
rw  8192   512  4096          0  10000831348736   /dev/sdj
rw  8192   512  4096       2048         1048576   /dev/sdj1
rw  8192   512   512       4096  10000829234688   /dev/sdj2
rw  8192   512  4096          0  10000831348736   /dev/sde
rw  8192   512  4096       2048         1048576   /dev/sde1
rw  8192   512   512       4096  10000829234688   /dev/sde2
rw  8192   512  4096          0  10000831348736   /dev/sdn
rw  8192   512  4096       2048         1048576   /dev/sdn1
rw  8192   512   512       4096  10000829234688   /dev/sdn2
rw  8192   512  4096          0  10000831348736   /dev/sdl
rw  8192   512  4096       2048         1048576   /dev/sdl1
rw  8192   512   512       4096  10000829234688   /dev/sdl2
rw  8192   512  4096          0  10000831348736   /dev/sdp
rw  8192   512  4096       2048         1048576   /dev/sdp1
rw  8192   512   512       4096  10000829234688   /dev/sdp2
rw  8192   512  4096          0  10000831348736   /dev/sdq
rw  8192   512  4096       2048         1048576   /dev/sdq1
rw  8192   512   512       4096  10000829234688   /dev/sdq2
rw  8192   512  4096          0  10000831348736   /dev/sdk
rw  8192   512  4096       2048         1048576   /dev/sdk1
rw  8192   512   512       4096  10000829234688   /dev/sdk2
rw  8192   512  4096          0  10000831348736   /dev/sda
rw  8192   512  4096       2048         1048576   /dev/sda1
rw  8192   512  4096       4096     17179869184   /dev/sda2
rw  8192   512  4096   33558528      1074790400   /dev/sda3
rw  8192   512  4096   35657728     53720645632   /dev/sda4
rw  8192   512   512  140580864   9928853929472   /dev/sda5
rw  8192   512  4096          0  10000831348736   /dev/sdd
rw  8192   512  4096       2048         1048576   /dev/sdd1
rw  8192   512   512       4096  10000829234688   /dev/sdd2
rw  8192   512  4096          0  10000831348736   /dev/sdc
rw  8192   512  4096       2048         1048576   /dev/sdc1
rw  8192   512   512       4096  10000829234688   /dev/sdc2
rw  8192   512  4096          0  10000831348736   /dev/sdo
rw  8192   512  4096       2048         1048576   /dev/sdo1
rw  8192   512   512       4096  10000829234688   /dev/sdo2
rw  8192   512  4096          0      1072693248   /dev/md127
rw  8192   512  4096          0     53686042624   /dev/md126
rw 32768   512  4096          0 158859490492416   /dev/md0

14TB hdds in 36 drives array are: WDC WUH721414AL5201

# blockdev --report
RO    RA   SSZ   BSZ   StartSec            Size   Device
rw  8192   512  4096          0    480103981056   /dev/sda
rw  8192   512  4096       2048       535822336   /dev/sda1
rw  8192   512  4096    1048576       536870912   /dev/sda2
rw  8192   512  4096    2097152    447569985536   /dev/sda3
rw  8192   512  4096  876257280     31457280000   /dev/sda4
rw  8192   512  4096          0    480103981056   /dev/sdb
rw  8192   512   512       2048       535822336   /dev/sdb1
rw  8192   512  4096    1048576       536870912   /dev/sdb2
rw  8192   512  4096    2097152    447569985536   /dev/sdb3
rw  8192   512  4096  876257280     31457280000   /dev/sdb4
rw  8192   512   512  937698992         2080256   /dev/sdb5
rw  8192   512  4096          0  14000519643136   /dev/sdc
rw  8192   512   512       2048  13999981706752   /dev/sdc1
rw  8192   512  4096          0  14000519643136   /dev/sdd
rw  8192   512   512       2048  13999981706752   /dev/sdd1
rw  8192   512  4096          0  14000519643136   /dev/sde
rw  8192   512   512       2048  13999981706752   /dev/sde1
rw  8192   512  4096          0  14000519643136   /dev/sdf
rw  8192   512   512       2048  13999981706752   /dev/sdf1
rw  8192   512  4096          0  14000519643136   /dev/sdg
rw  8192   512   512       2048  13999981706752   /dev/sdg1
rw  8192   512  4096          0  14000519643136   /dev/sdh
rw  8192   512   512       2048  13999981706752   /dev/sdh1
rw  8192   512  4096          0  14000519643136   /dev/sdi
rw  8192   512   512       2048  13999981706752   /dev/sdi1
rw  8192   512  4096          0  14000519643136   /dev/sdj
rw  8192   512   512       2048  13999981706752   /dev/sdj1
rw  8192   512  4096          0       536281088   /dev/md2
rw  8192   512  4096          0  14000519643136   /dev/sdk
rw  8192   512   512       2048  13999981706752   /dev/sdk1
rw  8192   512  4096          0  14000519643136   /dev/sdl
rw  8192   512   512       2048  13999981706752   /dev/sdl1
rw  8192   512  4096          0    447435767808   /dev/md3
rw  8192   512  4096          0  14000519643136   /dev/sdm
rw  8192   512   512       2048  13999981706752   /dev/sdm1
rw 69632   512  4096          0 475994761723904   /dev/md0
rw  8192   512  4096          0  14000519643136   /dev/sdo
rw  8192   512   512       2048  13999981706752   /dev/sdo1
rw  8192   512  4096          0  14000519643136   /dev/sdp
rw  8192   512   512       2048  13999981706752   /dev/sdp1
rw  8192   512  4096          0  14000519643136   /dev/sdn
rw  8192   512   512       2048  13999981706752   /dev/sdn1
rw  8192   512  4096          0  14000519643136   /dev/sdt
rw  8192   512   512       2048  13999981706752   /dev/sdt1
rw  8192   512  4096          0  14000519643136   /dev/sds
rw  8192   512   512       2048  13999981706752   /dev/sds1
rw  8192   512  4096          0  14000519643136   /dev/sdq
rw  8192   512   512       2048  13999981706752   /dev/sdq1
rw  8192   512  4096          0  14000519643136   /dev/sdu
rw  8192   512   512       2048  13999981706752   /dev/sdu1
rw  8192   512  4096          0  14000519643136   /dev/sdv
rw  8192   512   512       2048  13999981706752   /dev/sdv1
rw  8192   512  4096          0  14000519643136   /dev/sdw
rw  8192   512   512       2048  13999981706752   /dev/sdw1
rw  8192   512  4096          0  14000519643136   /dev/sdy
rw  8192   512   512       2048  13999981706752   /dev/sdy1
rw  8192   512  4096          0  14000519643136   /dev/sdr
rw  8192   512   512       2048  13999981706752   /dev/sdr1
rw  8192   512  4096          0  14000519643136   /dev/sdx
rw  8192   512   512       2048  13999981706752   /dev/sdx1
rw  8192   512  4096          0  14000519643136   /dev/sdaa
rw  8192   512   512       2048  13999981706752   /dev/sdaa1
rw  8192   512  4096          0  14000519643136   /dev/sdz
rw  8192   512   512       2048  13999981706752   /dev/sdz1
rw  8192   512  4096          0  14000519643136   /dev/sdab
rw  8192   512   512       2048  13999981706752   /dev/sdab1
rw  8192   512  4096          0  14000519643136   /dev/sdac
rw  8192   512   512       2048  13999981706752   /dev/sdac1
rw  8192   512  4096          0  14000519643136   /dev/sdad
rw  8192   512   512       2048  13999981706752   /dev/sdad1
rw  8192   512  4096          0  14000519643136   /dev/sdaf
rw  8192   512   512       2048  13999981706752   /dev/sdaf1
rw  8192   512  4096          0  14000519643136   /dev/sdae
rw  8192   512   512       2048  13999981706752   /dev/sdae1
rw  8192   512  4096          0  14000519643136   /dev/sdag
rw  8192   512   512       2048  13999981706752   /dev/sdag1
rw  8192   512  4096          0  14000519643136   /dev/sdai
rw  8192   512   512       2048  13999981706752   /dev/sdai1
rw  8192   512  4096          0  14000519643136   /dev/sdah
rw  8192   512   512       2048  13999981706752   /dev/sdah1
rw  8192   512  4096          0  14000519643136   /dev/sdal
rw  8192   512   512       2048  13999981706752   /dev/sdal1
rw  8192   512  4096          0  14000519643136   /dev/sdaj
rw  8192   512   512       2048  13999981706752   /dev/sdaj1
rw  8192   512  4096          0  14000519643136   /dev/sdak
rw  8192   512   512       2048  13999981706752   /dev/sdak1

On both arrays sync_speed_min/sync_speed_max is set to 200000.

18drives array is connected as JBOD via LSI SAS3008 PCI-Express Fusion-MPT SAS-3

36 drives array is connected as JBOD via two controllers LSI SAS3008 PCI-Express Fusion-MPT SAS-3

All controllers are in PCI-E 3.0 x8 slots: LnkSta: Speed 8GT/s (ok), Width x8 (ok)

My questions are:

  1. Why array with 36drives has almost two times slower resync rate ?
  2. iostat for array with 18 drives shows: 322.00 r/s 77282.40 rKB/s 16454.40 rrqm/s

but iostat for array with 36 drives shows: 9738.60 r/s 38956.00 rKB/s 0 rrqm/s

why second array is not doing io merge ?

  1. Is there anything i can try to speed up resync on second array ?

UPDATE

I was able to speed up 18drive array from 70MB/s to 180MB/s by increasing number of threads in mdraid:

echo 8 > /sys/block/md0/md/group_thread_cnt

what is even more intresting - doing the same on 36drive array resulted in decreasing performance from 40MB/s to 30MB/s.

UPDATE 2

Just noticed that rareq-sz from iostat on 36 drive array is only 4KB. It looks like all IO send to disks is always only 4KB. This is really strange. Why md raid is doing resync in 4KB chunks for this array ?

UPDATE 3

I have done a bit more research on 24 NVMe drives server and found that resync speed bottleneck affect RAID6 with >16 drives:

# mdadm --create --verbose /dev/md0 --level=6 --raid-devices=16
/dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1
/dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1 /dev/nvme10n1
/dev/nvme11n1 /dev/nvme12n1 /dev/nvme13n1 /dev/nvme14n1 /dev/nvme15n1
/dev/nvme16n1
# iostat -dx 5
Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1          0.00    0.00      0.00      0.00     0.00     0.00
0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
nvme1n1        342.60    0.40 161311.20      0.90 39996.60     0.00
99.15   0.00    2.88    0.00   0.99   470.84     2.25   2.51  86.04
nvme4n1        342.60    0.40 161311.20      0.90 39996.60     0.00
99.15   0.00    2.89    0.00   0.99   470.84     2.25   2.51  86.06
nvme5n1        342.60    0.40 161311.20      0.90 39996.60     0.00
99.15   0.00    2.89    0.00   0.99   470.84     2.25   2.51  86.14
nvme10n1       342.60    0.40 161311.20      0.90 39996.60     0.00
99.15   0.00    2.90    0.00   0.99   470.84     2.25   2.51  86.20

as you can see, there are 342 iops with ~470 rareq-sz, but when i create RAID6 with 17 drives or more:

# mdadm --create --verbose /dev/md0 --level=6 --raid-devices=17
/dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1
/dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1 /dev/nvme10n1
/dev/nvme11n1 /dev/nvme12n1 /dev/nvme13n1 /dev/nvme14n1 /dev/nvme15n1
/dev/nvme16n1 /dev/nvme17n1
# iostat -dx 5
Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s
%rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1          0.00    0.00      0.00      0.00     0.00     0.00
0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
nvme1n1       21484.20    0.40  85936.80      0.90     0.00     0.00
0.00   0.00    0.04    0.00   0.82     4.00     2.25   0.05  99.16
nvme4n1       21484.00    0.40  85936.00      0.90     0.00     0.00
0.00   0.00    0.03    0.00   0.74     4.00     2.25   0.05  99.16
nvme5n1       21484.00    0.40  85936.00      0.90     0.00     0.00
0.00   0.00    0.04    0.00   0.84     4.00     2.25   0.05  99.16

rareq-sz drops to 4, iops increase to 21483 and resync speed drops to 85MB/s.

Why is it like that?

Could someone let me know which part of mdraid kernel code is responsible for this limitation ?

djdomi avatar
za flag
hi, verify `sysctl dev.raid.speed_limit_min` and `sysctl dev.raid.speed_limit_max` and ONLY for the rebuild `mdadm --grow --bitmap=internal /dev/md0` and to revert `mdadm --grow --bitmap=none /dev/md0`
forke avatar
id flag
@djdomi sync_speed_min/sync_speed_max is set to 200000. Both arrays have internal bitmap active.
Michael Hampton avatar
cz flag
What is the hard drive?
forke avatar
id flag
@MichaelHampton 10TB hdds is 18 drives array are: TOSHIBA MG06ACA10TE, 14TB hdds in 36 drives array are: WDC WUH721414AL5201
djdomi avatar
za flag
hm, can you add ´blockdev –report` please
forke avatar
id flag
@djdomi added. I can see that blocksize for device is 4096, but blocksize for raid partition is 512. Can this affect performance ? Both configurations have the same values. Also StartSec differ - first has 4096 and second has 2048.
djdomi avatar
za flag
please show the full raid, i suggest to use report without any drive
Michael Hampton avatar
cz flag
Yeah, the drives should write faster than that. You may just be hitting a CPU thread limit, as you said: "md0_raid6 process is using 50-75% cpu in both servers." I'm not entirely sure how you would deal with that.
forke avatar
id flag
@MichaelHampton it is 50-75% cpu on single core. Total system CPUs usage is about 2%.
forke avatar
id flag
@djdomi added full blockdev report
Michael Hampton avatar
cz flag
That's correct. The problem is that I don't think it will use more than one thread.
djdomi avatar
za flag
well, its possible to tune the raid for more read a head with `blockdev –setra 65536 /dev/sdX` and revert it when done? But remind, it can take a lot of ram
forke avatar
id flag
@djdomi setting readahead for all disks to 65536 does not change resync speed in any way
forke avatar
id flag
@MichaelHampton I was able to change threads by changing group_thread_cnt to 8. It speed up 18 drive array from 70MB/s to 180MB/s, but the same for 36 drive array resulted in performance drop from 40MB/s to 30MB/s
djdomi avatar
za flag
@forke yeh its just generics thinking, it might improve but can do also the reverse one, maybe `--bitmap-chunk`might increase to 128mb or even a bit more might also help
Score:3
id flag

Found answer with Linux mdraid kernel developers. It is a bug in linux kernel.

Song Liu has confirmed that the issue is caused by blk_plug logic and provided patch for this. More details here:

https://lore.kernel.org/linux-raid/CAPhsuW7R=8XyU5wU1-NT-Eo=Hir7gV_b7+_MB+bUFx3bEecbDw@mail.gmail.com/

https://lore.kernel.org/linux-raid/[email protected]/T/#t

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.