Should I allow a hardware raid5 array rebuild to complete after swapping out a drive PRIOR to running an xfs_repair on the volume?
Currently xfs_repair keeps failing in Phase 7 at the same spot:
Phase 7 - verify and correct link counts...
Metadata corruption detected at 0x45bf78, xfs_dir3_block block 0x6a945ef98/0x1000
libxfs_bwrite: write verifier failed on xfs_dir3_block bno 0x6a945ef98/0x8
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!
Basically: Am I being stupid? Is this an instance of:
- "Of COURSE I'm seeing data weirdness while the array is being rebuilt" and should just leave it TF alone and let it finish... probably sometime tomorrow or the day after... before I then get started on the xfs_repair.
eg: We'll wait until you're ready.
-OR-
2) I can continue to work on the XFS filesystem WHILE the raid5 hardware array rebuild is also going on, and once the XFS filesystem repair is complete mount the volume, rsync the backed up data over to the primary partition, and get on with my life while the hardware raid rebuild continues to potato chug it's way along but IDGAF because the system is back up and it can take however long it wants so long as it eventually completes.
I Hate Waiting!
Background info:
I have a 16TB XFS filesystem running on a Hardware RAID5 in a HP DL380 with a Smart Array P410i controller. I am consistently running into filesystem corruption issues with an XFS filesystem. I've swapped out the controller, and swapped out one of the 6 4TB drives which I suspect might have been the culprit.
The array rebuild is taking quite a long time, and I'd like to get this system back up and running fairly quickly. The hardware recovery / rebuild is at 56% now, after running for a day. This isn't hugely abnormal for this controller and an array this size. The issue I'm facing however is I want to get the xfs_repair started. When I attempted the first xfs_repair, it said the filesystem was dirty and needed to be mounted again. Fine. Mount / umount was done. No drama there. Kicked off the xfs_repair and it started working it's way through the inodes and metadata corruption.
It then occurred to me, that maybe running the xfs_repair while the rebuild was still ongoing might be a... let's say not a great idea. I have all the data backed up on a large drive shelf backup, so I'm not hugely concerned about files being shuffled off to the nomans land of lost+found. I'll rsync the (slower 63TB) drive shelf backup array over to the (faster 16TB) "production" array once the array rebuild completes. That in theory would still be faster than nuking the 16TB array and doing a full restore from backup... but:
Should I step back, stuff my hands in my pockets, glower at the % complete S_L_O_W_L_Y claw it's way to 100%, allow the hardware raid5 array to complete the RAID5 array rebuild PRIOR to running the xfs_repair -OR- can I continue trying to complete the xfs_repair while the array rebuild slowly, painfully, continues to chug along?