Score:6

RAID10 with 2 failed drives

bs flag

I have 4 disks in a raid10 array into a dell server. 2 drives are failed right now. The failed drives are not in the same group (1+2 and 3+4), drives failed are 2 and 3. The server is still running, but the array is degraded. I'm waiting the new drives to arrive.

What is the best way to replace drives? Can I replace both disks at the same time or it's better to rebuild the first group first then replace the other?

Russell McMahon avatar
in flag
You may know why the drives failed but, if not, I'd suggest a very careful analysis of the reasons. Based on my experience with standard commercial/retail drives, real-world lifetimes are typically 5 to 10 years, with some unfortunate lesser time failures. Two drive failures within a short period suggests a common cause. Maybe power supply issues, temperature, other environment.
Score:12
cn flag
  1. Make sure you have an up-to-date backup. If, for whatever reason, you don’t - start evacuating your data immediately! Don’t do it in parallel with the RAID rebuild as it puts more pressure on disks and instead of the streamline process you’ll get tons of random I/O killing performance. One you really need to get data somewhere ASAP.

  2. Replace faulty disks one-by-one. Doing a parallel rebuild in this case is going to be slower actually, and it’s not what you want.

Hope this helped.

P.S. You’ve been exceptionally lucky already as disks didn’t fail within same RAID1 group. Don’t stretch your luck thin!

G1R0UARD avatar
bs flag
Yes I am very lucky. That is what I have in my head too Thanks!
Konrad Gajewski avatar
cl flag
Why on Earth is it going to be slower? Speed-wise, maybe, but you will be doing it in parallel, right? The desired result of a non-degraded array should be quicker when doing both at the same time.
Peter Cordes avatar
ke flag
Why would parallel rebuild be slower? RAID10 has two separate pairs of mirrors, so it's just two independent drive copies. Or are you talking about doing this in the presence of significant I/O load from the still-running system, so reads during that time can be satisfied from the one disk that isn't part of the rebuild, reducing seeks on the pair that's rebuilding? That would make sense.
ilkkachu avatar
us flag
@PeterCordes, wouldn't the disk that's not part of the rebuild have different data than the one sourcing the new drive? Otherwise it would be part of the rebuild process. So it can't help with reads targeting the rebuilding pair. (But would allow some reads to complete uncontested, not that I'm sure that helps in practice.) I also don't get why a parallel rebuild would be _slower_; at worst it should be the same speed, e.g. if the limiting factor isn't the disks but the controller. Doing it one pair at a time might let the first pair finish faster, but I'm not sure if that's too useful either.
Score:2
ru flag

Not sure about the PERC but I'd replace one drive at a time. Most likely the controller schedules two rebuilds and only runs one of them at a time anyway.

Of course, you should have a reliable and tested backup at all times.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.