Score:0

Any way to recover data from a corrupted RAID 6 array?

cn flag

I have a pretty old RAID 6 array on MegaRAID 9361-8i / Windows (which has 16x 8TB HDDs). Recently, I found 2 HDDs were dead according to Megaraid Storage Manager, and replaced one of them with a new drive. There was a little hiccup during the process. I can't recall exactly what I did precisely, but it was more or less like this;

  1. mark one of the dead drives as unconfigured good
  2. mark the drive offline
  3. swap the drive with a new one (without turning off the system)
  4. mark the drive online
  5. rebuild and consistency check (both were automatically triggered)

The hiccup happened when I did 3. Right after the swap, 3 more drives suddenly went offline (maybe faulty cable? backplane? I don't know what caused this). I lost all hope at that point, so I just marked those offline drives as unconfigured good and then made them online, one by one. After that, I made the swapped drive back online. To my surprise, the array came out fine without any corruption after the rebuild and consistency check.

The result somewhat encouraged me to perform the exact same procedure for the last failed drive remaining, so I did just that. I faced the same hiccup during the process, and this time, 4 drives went offline. I marked them as unconfigured good and then made them online one by one. However, after the drive swap, two rebuild jobs were automatically triggered (instead of 1 like the previous swap), and after all the rebuilds and consistency check, the array came out corrupted.

At this moment, I got the following symptoms (none of these symptoms were there after the first swap).

  1. Every media file in the array either outright don't play, or stop playing after a few second, or constantly skip a second or two, or showing me very faulty images.
  2. Every big compressed file failed during the decompression (I can still decompress some small compressed files).
  3. Explorer said the array has roughly two drives' worth of empty space(16TB), but it should be less than 1TB.
  4. No log shown on Megaraid Storage Manager.

Back in the day, I had similar problems on this array after swapping faulty drives (symptoms were similar to 1 and 2, but back then, only some files were affected, not every file), and I was able to recover most of the corrupted files by running chkdsk. But chkdsk /f didn't do anything this time. Before I try chkdsk /r, I'm currently running DMDE full scan. At this moment (5% progress), it shows 1 NTFS Main Result (NTFS 0) and 3 NTFS additional results (NTFS 1 to 3, all of them is around half the size of NTFS 0). Not exactly sure what that means as this is first time running DMDE.

What would be the best course of action to recover the corrupted files? I don't have any backup for this array and not willing to spend huge fortune to recover the files...

EDIT The full scan has been completed, and these are what I see on DMDE (Min. Size, progress bar, Start LBA)

  • NTFS - Main Results
    • NTFS 0 - the full size of the array, two progress bars stacked onto each other (top bar 70% green, 15% red, bottom 60% green, 25% red), 264k
  • NTFS - Additional Results
    • NTFS 5 - a few MBs, some red colored indicator on progress bar (40% green, 15% red), around 15G
    • NTFS 4 - about the same as single drive, some red color on progress bar (5% green, 5% red), negative value (around -1G)
    • NTFS 6 - slightly bigger than the half size of the array, some red color on progress bar (5% green, 5% red), 264k
  • NTFS - Rest Results
    • NTFS 7 - almost the same as NTFS 6
    • NTFS 2 - slightly smaller than the half size, some red color on progress bar (5% green, 5% red), around 2G
    • NTFS 3 - almost the same as NTFS 2
    • NTFS 1 - similar to NTFS 6 but slightly smaller, some red color on progress bar (5% green, 5% red), around 1G
    • NTFS 9 - almost the full size of the array, some red color on progress bar (5% green, 5% red), 321k
Score:1
us flag

Since RAID 6 stripes data over multiple devices, this looks like one of the devices is missing from the array, and the RAID controller doesn't realize it is missing a device. This means every continuous stream of data has missing data in it.

Running chkdsk can only make things worse, since the issues are below the actual filesystem.

You would somehow need to fix the array configuration so that all devices are properly in the array. I don't know how to actually do that within the MegaRAID system.

Before anything else, I would recommed getting an image of every hard drive to a separate medium, so that no new writes would corrupt the system further.

Although it is also possible that the data is already corrupted in unrecoverable ways.

e1630m avatar
cn flag
Sigh, I was suspecting either that, or I somehow misconfigured the drives and those misconfigured drives corrupted the array during the rebuild process (when 4 drives went offline, there were popups that make me choose where each disk belongs in the array). Maybe it's beyond recovery.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.