Score:0

P420i controller / 2 disks failed in RAID5 / RAW FS after parity initilization

fr flag

We woke up this morning with 2 failed disks on RAID5 with a single hot-spare configuration.

Hot-spare disk didn't replace any damaged disk maybe because there are 2 disks failed at the same time.

However, I added two new disks and the parity is initializing now but the partition filesystem changed to RAW. shall I wait to finish the initialization? or I lost all data on the logical volume... Do you recommend using commercial recovery software to restore (VHDX files) from RAW FS? please advise.

Michael Hampton avatar
cz flag
You erased the remains of your array and created a new one. You're beyond recovery and should go now to your backups.
Score:3
br flag

Do you recommend using commercial recovery software to restore (VHDX files) from RAW FS? please advise

R5 doesn't recover from two disks failing, it's also dangerously bad these days anyway, please don't use it again. Anyway you can try to recover them but it'll be expensive, take a while and unlikely to help - best just recover from backup - much quicker - and onto R1/10 R6/60 please :)

Score:1
ru flag

It's not uncommon to have one drive fail in a RAID5 and then have a second drive fail on rebuild - if you haven't taken care of the array.

The core of the problem is that some unused data blocks may slowly degrade (bit rot). It simply isn't detected (and automatically repaired/remapped by the drive) because it's not been read back. However, on a rebuild all data needs to be read and if it can't rebuild fails. Bummer.

Using RAID classes with multiple redundancy like levels 6 or 60 is a good way to avoid this kind of problem - in short: RAID 6 is practically immune to bit rot and a much better choice than RAID 5 + hot spare.

RAID levels 1 and 10 can also exhibit the bit-rot problem, but probability is lower than with R5.

Sometimes, you cannot run anything but RAID levels 5 or 50. In that case it's essential (and a good idea for the other RAID levels as well) that you run a regular media scan aka disk scrubbing, media patrol, patrol read, surface scan. That ensures that all soft errors are fixed before they become hard errors. Strangely, scrubbing is not active by default on most controllers.

In your case, either the data has been corrupted or is zeroed anyway. Simply recreate partitioning, format and restore from backup. Of course, a regular backup is even more essential than disk scrubbing. Even a well-groomed RAID is no replacement for a good backup strategy.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.