Score:1

Rebuild RAID5 with uncorrectable sectors on multiple disks

cn flag

My software RAID5 (mdadm) system consists of five disks. Recently, I get I/O errors when reading certain files. Most of the other files are still readable.

At first, I was planning to find out which disk is broken (using smartctl) and quickly replace the failed disk to rebuild the array before other disks fail as well. However, smartctl shows that three disks have uncorrectable errors.

I'd think that mdadm should still be able to rebuild as long as the bad sectors of these three disks do not intersect, allowing me the option to swap and rebuild one by one.

Or does the fact that I have an I/O error already indicate that parity is lost and the same sector on multiple disks is unreadable? Is there some way to find out whether or not any failing sectors intersect, and thus information is irreversibly lost?

Michael Hampton avatar
cz flag
You don't appear to have a backup. That should be your top priority.
Nikita Kipriyanov avatar
za flag
@CIA The latter idea of dd is tempting but dubious. When a block cannot be read, MD will see this and recover the correct data from other disks. On the other hand, `dd` will fill this space with zeros, and the block will be readable from the new device, and the RAID layer will not know what would be better to recover, which would lead to data corruption. // Also, you don't need to invest into software "to help identify how blocks are setup". This software is called `mdadm`, it'll tell you if you ask properly, and the Linux kernel documentation and source will explain its answer in details.
Score:3
ru flag

The standard procedures are:

  1. Always have a good, up-to-date backup (at least two independent copies in different places, at least on different media)
  2. Continuously monitor your RAID for problems. A RAID is worthless when errors are allowed to accumulate.
  3. Scrub disks at least monthly. This avoids errors to accumulate and to prevent rebuilds.
  4. Consider RAID 6 with two redundant disks.

You don't seem to have taken this seriously. Try to recover what's still there now. Trying to rebuild that nearly failed array might case more damage than you expect.

If the data is valuable enough, find a trustworthy and capable data recovery service. Put aside a four to five digit amount of cash. Otherwise, rinse & repeat - replace disks, reformat, reinstall and take the standard procedures more seriously.

Score:2
za flag
  1. You are correct in that if unreadable sectors "don't intersect", i.e. lying in different stripes, MD RAID may recover data using parity. But it may kick some drive out during recovery, and then chances will decline significantly.

  2. There is a general rule of data recovery: always begin with a raw dump. This guarantees you unlimited attempts: if you mess something up, you can start again with the dump. So in general, you can clone all dying disks to some working ones, reading through the errors, and then assemble RAID out of new disks.

  3. You may start with cloning each drive sector-by-sector to a replacement with ddrescue (i.e. not by using MD RAID recovery procedure). In addition to copying through errors, it creates what it calls a log file, which is actually the bad sector map. When you clone all three of them, you may compare those maps and find out if there are any intersections. Don't throw them away, these maps may help you during the recovery.

  4. However, RAID5 is very nasty beast in the sense of such dumps. What could go wrong? If your drive's sector doesn't read at all throwing I/O error, RAID layer will recover that data from other disks; that would be case for old disks. But if it reads without errors, but returns wrong data, RAID won't try to recover it from parity and return that wrong data instead. ddrescue will fill unreadable sectors with zeros, which will be read back if you assemble array with this clone device later, so this will translate to reading zeros (corrupted data) where it was potentially possible to recover original data. RAID doesn't guarantee the data integrity. And this is the real problem for all variants except RAID6 which has two parity syndroms or RAID1 with more than two mirrors. And, you may already have guessed, this problem manifests itself in most disruptive way in case of RAID5. (There is additional consideration against it, something about modern disk sizes and their bit error rates.)

  5. During any cloning operation a disk may die completely. Then you stuck. There is possibility to do recovery beyond this point, but it will cost you much. There are services where are "clean rooms" and they can e.g. replace heads inside hard disks and re-try reading it; it is slow, error-prone and they likely charge you quite a lot. Consider this if your data is very valuable.

  6. Therefore, it is wise to clone original disks, but then put clones away, assemble array from original disks and try to clone from array itself (/dev/mdX). If something goes wrong (disk dies), replace it with clone and manually recover broken stripes (read p.4) afterwards, consulting with log files (p.3) This is quite hard work. Notice also, that you need to spare twice the original space to perform the recovery. Or don't do anything yourself, outsource the whole work to specialists. This is price you pay for improper maintenance of the array and the data.

  7. And, now, you have this precious experience. Don't blame arrays, blame yourself, learn the lesson and manage them correctly:

  • Think three times before using RAID5. Then say "no" and go for another RAID level.
  • Scrub the array regularly. This means the MD RAID will read and compare data on drives and it will ring bell if something is wrong (mismatch, unreadable block). Then you may replace bad behaving drive on early symptoms. Good distros have this configured out of the box (Debian at least).
  • Monitor the disks and array, to not to miss important signs of problems.
  • Finally, welcome to the club of administrators who regularly back up their data.
user9517 avatar
cn flag
Your last point should be your first.
Nikita Kipriyanov avatar
za flag
The question was not "how to manage the array proprly", it was "how to recover the array". So technically the last point is *offtopic*. But it is usefult, this is why I included it.
user9517 avatar
cn flag
It doesn't hurt to reinforce good practice early and often.
Nikita Kipriyanov avatar
za flag
O.k., so it is the *last* point, the best place to reinforce. I'm trying to help, not to finish the questioner off, who is probably already tearing his hair for not making the backups.
cn flag
@NikitaKipriyanov About p.1: Why would MD RAID 'kick out' a drive during recovery? What does 'kick out' mean in this sense? I was expecting this to be much easier to solve t.b.h. At least as long as the bad sectors do not intersect. I thought that I could simply 1) swap the first failed disk 2) rebuild the RAID, and 3) repeat this process for all three times. Suppose that I replace disk #1, is the chance really that high that disk #2 gets 'kicked out' in the process of rebuilding the RAID?
Nikita Kipriyanov avatar
za flag
1. MD RAID has a "failed" state of the disk. It is the state when disk is counted as a part of array, but doesn't participate in I/O operations. MD can transfer a disk into this state if it behaves "too badly", for example, slow (misses deadlines), throws many I/O errors and so on. 2. Even in theory, by doing this "easy" way you effectively lose the opportunity to recover all bad sectors of all other disks. Because it requres the data that was in live sectors of a first old disk, which you swapped out. This alone would be unacceptable to me. But this is not the only caveat.
Nikita Kipriyanov avatar
za flag
(2. cont) The main problem is the behaviour of MD RAID resync process when it encounters I/O error. I never went through this process for RAID5, but in case of RAID1 resync, I've seen the following: if it encounters unreadable block on the source (the only drive that has a complete copy for now), it restarts the resync from the beginning. And then it goes until that first bad block and restarts again. It never went past first bad block. We once resolved this by force remapping that bad sector (`hdparm -w` if I remember, ouch). This is why your "easy" scenario seems completely wrong to me.
cn flag
I see your point: When removing failing disk #1, I am effectively also removing the information needed to reconstruct the unreadable sectors of say failing disk #2. However, suppose I _add_ a new disk to my array, shouldn't it be theoretically possible to fill that disk with redundant information so that I can swap and rebuild the failing disks one by one? It it 'only' a matter of the tools not being available to do so, of am I completely missing the point on why this fundamentally impossible?
Nikita Kipriyanov avatar
za flag
Yes, there is no automatic resolution of multiple disks failures and no intrinsic consistency maintenance, as I describe in p.4. That's RAID5, you should knew this when you deployed it. The procedure I described in p.6 (cloning the data from the assembled array) is the way to resolve it, based on assumption in p.1). The p.3 is needed to be guarded against worst cases, because p.6 could be in general a stressful operaion.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.