Everyone in the comments was telling me to use a professional recovery service, but if the data was important enough to spend that kind of money on recovery, I would have made a backup during its lifetime.
So instead this was a learning opportunity! Here's what I learned:
First of all, I was wrong in my assumption. You can call mdadm --create
multiple times non-destructively. I started by calling:
sudo mdadm --create /dev/md1 --create --assume-clean -l5 -n4 -c512 /dev/sd[abc]1 missing
And then I tried to mount it. That failed, because the mount couldn't figure out the filesystem-type, which told me that I'd gotten the order wrong. My "status" lines went from "UUUU
" when working to "_UUU
" to "_UU_
", so I figured maybe the broken drive should be first, and so I ran the same command as above but with missing /dev/sd[abc]1
instead. Turns out if I'm rewriting the metadata anyway, it doesn't matter that I'd already rewritten it! But it also told me that there wasn't any magic in --assume-clean
that auto-detects the order of disks. This command is order-sensitive, I just don't have to get it right the first time. I mounted the array read-only so no spurious writes corrupted the already screwed up ordering.
This ordering allowed me to mount it, but I had obvious corruption. Some of the directories seemed intact, but others gave me IO errors just listing them. That didn't seem good... I tried spot-checking a few files and they appeared to have bands of intact data and bands of garbage. It seemed to go in-and-out with some regularity, which made me suspicious. I know that the way RAID5 works is that it stripes the data and puts a few blocks onto each disk, followed by a parity block. So if I got the order wrong, but I got the first disk right, then the magic-block would allow the FS type to be read (and mounted), but later data would be corrupted. This would also cause bands of corruption as I had, say, a good block on an intact disk, then a displaced block in the wrong order, then when we got to the missing block to recompute from parity it would be fully nonsense as it's computing against the wrong data.
So I tried a few orderings, since I had nothing to lose and found that:
sudo mdadm --create /dev/md1 --assume-clean -l5 -n4 -c512 missing /dev/sd{b,a,c}1 && sudo mount -o ro /dev/md1 /mnt/raid
Worked for me. It could list all the files, and spot-checking a few files they appeared to be intact as far as I could tell. I assume there's some corruption somewhere, but for my purposes I'd rather have some data than have it all perfect, so this was good enough for me. (I switched to {}
so I could control the ordering, and by this point I'd done a restart so the old sdd1
was now sdc1
). This order was chosen because, again, the "_UU_
" order implied to me that the missing disk was the first one, and the failed sdd1
(now sdc1
) was last, so I just had to try the two orders between "ab" and "ba" on those middle drives. Turns out that order is the definitive one, but I don't know where that layout comes from...
So, anyway, this let me copy 1.3TB of my 2.7TB before the array crashed a second time, and spot-checking a few random files from the copied data it seems to have roughly worked! I used rsync
which obviously can't catch on-disk corruption, but did gracefully handle the corruption when the disk failed part-way through the transfer.
So anyway, for future readers, if the data is at all important, absolutely pay someone. But if you're like me and you have nothing to lose and want to dick around with some mdadm
, this is what I've learned!