Score:0

Weekly RAID check affecting my system - any way to mitigate?

ml flag

I recently got my webhost (Hetzner) to add a paid of 16TB SATA drives to my webserver. Currently using 2.5TB of them. They're RAID 1 mirrored.

I also have two 4TB nVME drives with 700GB currently on them, also RAID 1 mirrored.

Every week CentOS kicks off a cronjob to run a "check" on both of my md arrays. They happen concurrently, with the nVME one finishing after 5 hours. The SATA one takes a painful 18hours, at 200MB/sec the whole time.

# Run system wide raid-check once a week on Sunday at 1am by default
0 1 * * Sun root /usr/sbin/raid-check

My server is plenty powerful, with a 32core EPYC and 128GB of ram, but I do notice an IO slowdown when this check is running.

  1. is it necessary to run these weekly?

  2. 200MB/sec * 18 hours means it's doing the whole 16TB, not just the occupied space. Can this be made smarter/lazier in any way, so it only runs on the occupied space?

  3. could this job be niced or similar? I appreciate it would take longer, but that might be preferable. see edit below

  4. would scripting pauses into this be a bad idea? So instead of 18 hours in one hit I could do (say) 3 hours per night?

  5. is this a problem everyone suffers, or have I made some poor decisions? Would getting a hardware raid card installed make me much happier, for example?

Edit

I have now discovered /etc/sysconfig/raid-check and changed NICE=low to NICE=idle. I guess I won't know what difference that makes until next week.

paladin avatar
id flag
Use btrfs-raid1 (by using btrfs filesystem) instead of stupid mdadm raid1.
Codemonkey avatar
ml flag
Can you tell me more @paladin - why would that be better? And I assume I can't convert it in-place, I'd need to move the data to other drives first, then move back? I'm a full stack dev running my own business/server/site, I'm happy to admit this ain't my field of expertise. Hell, I don't have a field of expertise these days!
paladin avatar
id flag
btrfs filesystem supports raid on filesystem level, while mdadm does raid on block level. btrfs also does create checksum of all files and all data, while mdadm doesn't. mdadm is just stupid. btrfs compares all all metadata and all data with checksum and also is ablw to compare it with a copy (raid1 or dup copy). Should something be corrupt, only the corrupt file will be repaired, there is no need for an entire disk block level check. But please read about btrfs first, as some functions of this filesystem are different to your usual ext4 and co..
paladin avatar
id flag
You should really read more about it [here](https://btrfs.wiki.kernel.org/index.php/Main_Page). btrfs is production ready and stable to use when you use it in the right way. I'll write a small summarize later. PS you should really not use btrfs-raid5 or btrfs-raid6 mode, as those modes are experimental and highly daangerous (more dangerous than raid0). A btrfs filesystem should also always be mounted with `noatime` mount option.
Score:2
za flag

No, MD RAID can't be smarter than this. If you want to only check used areas, use ZFS, or perhaps BTRFS.

Weekly check is too often. Do this on monthly basis, or even every other month.

I don't know what this NICE really does. If it's setting the I/O nice of the [mdX_resync] kernel process, that's good and use idle. What you can limit is the bandwidth of the check: it's set in the /sys/block/mdX/md/sync_speed_max file in kB/s. This is a virtual file, e.g. it'll be reset after system restart.

By the way, it's limited at 200 MB/s by default and you seem to hit that limit. You may increase speed for SSDs (set 5000000 and and see in what time they will be checked). And instead "pausing" it for HDDs, I'd play with limits (e.g. during periods of high load I'd set lower limit, during idle time I'd set 600000 — SATA 6 Gb/S interface maximum bandwidth).

I doubt HW RAID card will make things much better.

Nikita Kipriyanov avatar
za flag
As I somewhat returned to the thread, I explain, why HW RAID card won't make things better and it's just a waste of money. The bottleneck here is not the bus, the bottleneck is the performance of spinning rust. It is several orders of magnitude lower than bus and CPU could provide. HDD could deliver only that much I/O per second and additionally are not very good in serving parallel threads (spending more time seeking here and there). During the check you share those IOPS with the useful load. It doesn't matter the card or the CPU is doing it, useful load will get less IOPS during the check.
jm flag
A hardware RAID card will make things much better. The md checkarray command scans every sector of every disk for consistency and bit rot. This is done by the process reading every block so it is I/O intensive and somewhat CPU intensive. With a hardware raid, these functions are run from within the card so no I/O on the bus and the cpu is not involved.
Codemonkey avatar
ml flag
Interesting, thank you. I certainly thought it odd that the nVME check took so long, the 200MB/s limit makes sense. Although I would LIKE to run the job less often, I believe that Debian opts for monthly and RHEL weekly. Who's to say which is correct... can you flesh out why you believe weekly to be "too often"?
Codemonkey avatar
ml flag
Additionally, do you know at what point raid-check will re-load the conf file? Or how to make it do so? I've tried idling the checks (`echo idle > /sys/devices/virtual/block/mdX/md/sync_action`) and then starting again but that doesn't seem to do it. (I've set `MAX_CONCURRENT=1` and it's happily doing both at the same time right now)
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.