Score:0

RAID1 Recovery after Degradation

ye flag

Below is the output from lsblk, mdadm and /proc/mdstat for my 2 disk Raid1 array

anand@ironman:~$ lsblk 
NAME                      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                         8:0    0 465.8G  0 disk  
|-sda1                      8:1    0   976M  0 part  
| `-md0                     9:0    0 975.4M  0 raid1 
|   `-vg_boot-boot (dm-6) 253:6    0   972M  0 lvm   /boot
`-sda2                      8:2    0 464.8G  0 part  
sdb                         8:16   0 465.8G  0 disk  
|-sdb1                      8:17   0   976M  0 part  
`-sdb2                      8:18   0 464.8G  0 part  
  `-md1                     9:1    0 464.7G  0 raid1 
    |-vg00-root (dm-0)    253:0    0  93.1G  0 lvm   /
    |-vg00-home (dm-1)    253:1    0  96.6G  0 lvm   /home
    |-vg00-var (dm-2)     253:2    0  46.6G  0 lvm   /var
    |-vg00-usr (dm-3)     253:3    0  46.6G  0 lvm   /usr
    |-vg00-swap1 (dm-4)   253:4    0   7.5G  0 lvm   [SWAP]
    `-vg00-tmp (dm-5)     253:5    0   952M  0 lvm   /tmp

anand@ironman:~$ cat /proc/mdstat
Personalities : [raid1] 
md1 : active raid1 sdb2[1]
      487253824 blocks super 1.2 [2/1] [_U]
      
md0 : active raid1 sda1[0]
      998848 blocks super 1.2 [2/1] [U_]
      
unused devices: <none>

anand@ironman:~$ sudo mdadm -D /dev/md0 /dev/md1
/dev/md0:
        Version : 1.2
  Creation Time : Wed May 22 21:00:35 2013
     Raid Level : raid1
     Array Size : 998848 (975.60 MiB 1022.82 MB)
  Used Dev Size : 998848 (975.60 MiB 1022.82 MB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Thu Oct 21 14:35:36 2021
          State : clean, degraded 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : ironman:0  (local to host ironman)
           UUID : cbcb9fb6:f7727516:9328d30a:0a970c9b
         Events : 4415

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       0        0        1      removed
/dev/md1:
        Version : 1.2
  Creation Time : Wed May 22 21:00:47 2013
     Raid Level : raid1
     Array Size : 487253824 (464.68 GiB 498.95 GB)
  Used Dev Size : 487253824 (464.68 GiB 498.95 GB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Thu Oct 21 14:35:45 2021
          State : clean, degraded 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : ironman:1  (local to host ironman)
           UUID : 3f64c0ce:fcb9ff92:d5fd68d7:844b7e12
         Events : 63025777

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       18        1      active sync   /dev/sdb2

What are the commands to use to recover from the raid1 failure?

Do I have to get a new hard drive to safely reassemble the raid1 setup?

Update 1:

    anand@ironman:~$ sudo smartctl -H /dev/sda 
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
190 Airflow_Temperature_Cel 0x0022   054   040   045    Old_age   Always   In_the_past 46 (0 174 46 28)

anand@ironman:~$ sudo smartctl -H /dev/sdb
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

anand@ironman:~$ 

S.M.A.R.T Info:

Output from smartctl -a -d ata /dev/sda Output from smartctl -a -d ata /dev/sdb

Update 2:

anand@ironman:~$ sudo blkid -o list
device                                  fs_type        label           mount point                                 UUID
---------------------------------------------
/dev/sda1                               linux_raid_member ironman:0    (in use)                                    cbcb9fb6-f772-7516-9328-d30a0a970c9b
/dev/sda2                               linux_raid_member ironman:1    (not mounted)                               3f64c0ce-fcb9-ff92-d5fd-68d7844b7e12
/dev/sdb1                               linux_raid_member ironman:0    (not mounted)                               cbcb9fb6-f772-7516-9328-d30a0a970c9b
/dev/sdb2                               linux_raid_member ironman:1    (in use)                                    3f64c0ce-fcb9-ff92-d5fd-68d7844b7e12
/dev/md0                                LVM2_member                    (in use)                                    JKI3Lr-VdDK-Ogsk-KOQk-jSKJ-udAV-Vt4ckP
/dev/md1                                LVM2_member                    (in use)                                    CAqW3D-WJ7g-2lbw-G3cn-nidp-2jdQ-evFe7r
/dev/mapper/vg00-root                   ext4           root            /                                           82334ff8-3eff-4fc7-9b86-b11eeda314ae
/dev/mapper/vg00-home                   ext4           home            /home                                       8e9f74dd-08e4-45a3-a492-d4eaf22a1d68
/dev/mapper/vg00-var                    ext4           var             /var                                        0e798199-3219-458d-81b8-b94a5736f1be
/dev/mapper/vg00-usr                    ext4           usr             /usr                                        d8a335fc-72e6-4b98-985e-65cff08c4e22
/dev/mapper/vg00-swap1                  swap                           <swap>                                      b95ee4ca-fcca-487f-b6ff-d6c0d49426d8
/dev/mapper/vg00-tmp                    ext4           tmp             /tmp                                        c879fae8-bd25-431d-be3e-6120d0381cb8
/dev/mapper/vg_boot-boot                ext4           boot            /boot                                       12684df6-6c4a-450f-8ed1-d3149609a149

-- End Update 2

Update 3 - After following Nikita's suggestions:

/dev/md0:                                                                   │                                                                           
        Version : 1.2                                                       │                                                                           
  Creation Time : Wed May 22 21:00:35 2013                                  │                                                                           
     Raid Level : raid1                                                     │                                                                           
     Array Size : 998848 (975.60 MiB 1022.82 MB)                            │                                                                           
  Used Dev Size : 998848 (975.60 MiB 1022.82 MB)                            │                                                                           
   Raid Devices : 2                                                         │                                                                           
  Total Devices : 2                                                         │                                                                           
    Persistence : Superblock is persistent                                  │                                                                           
                                                                            │                                                                           
    Update Time : Fri Oct 22 21:20:09 2021                                  │                                                                           
          State : clean                                                     │                                                                           
 Active Devices : 2                                                         │                                                                           
Working Devices : 2                                                         │                                                                           
 Failed Devices : 0                                                         │                                                                           
  Spare Devices : 0                                                         │                                                                           
                                                                            │                                                                           
           Name : ironman:0  (local to host ironman)                        │                                                                           
           UUID : cbcb9fb6:f7727516:9328d30a:0a970c9b                       │                                                                           
         Events : 4478                                                      │                                                                           
                                                                            │                                                                           
    Number   Major   Minor   RaidDevice State                               │                                                                           
       0       8        1        0      active sync   /dev/sda1             │                                                                           
       2       8       17        1      active sync   /dev/sdb1   



anand@ironman:~/.scripts/automatem/bkp$ sudo mdadm -D /dev/md1              │                                                                           
/dev/md1:                                                                   │                                                                           
        Version : 1.2                                                       │                                                                           
  Creation Time : Wed May 22 21:00:47 2013                                  │                                                                           
     Raid Level : raid1                                                     │                                                                           
     Array Size : 487253824 (464.68 GiB 498.95 GB)                          │                                                                           
  Used Dev Size : 487253824 (464.68 GiB 498.95 GB)                          │                                                                           
   Raid Devices : 2                                                         │                                                                           
  Total Devices : 2                                                         │                                                                           
    Persistence : Superblock is persistent                                  │                                                                           
                                                                            │                                                                           
    Update Time : Fri Oct 22 21:21:37 2021                                  │                                                                           
          State : clean                                                     │                                                                           
 Active Devices : 2                                                         │                                                                           
Working Devices : 2                                                         │                                                                           
 Failed Devices : 0                                                         │                                                                           
  Spare Devices : 0                                                         │                                                                           
                                                                            │                                                                           
           Name : ironman:1  (local to host ironman)                        │                                                                           
           UUID : 3f64c0ce:fcb9ff92:d5fd68d7:844b7e12                       │                                                                           
         Events : 63038935                                                  │                                                                           
                                                                            │                                                                           
    Number   Major   Minor   RaidDevice State                               │                                                                           
       2       8       18        0      active sync   /dev/sdb2             │                                                                           
       1       8       34        1      active sync   /dev/sdc2 

Thank you all!

Anand

Nikita Kipriyanov avatar
za flag
How did you get into this state? See `dmesg`. Also check S.M.A.R.T. of both devices. Yes, actions must be taken, but I doubt which ones.
ye flag
I have updated the S.M.A.R.T info.
Nikita Kipriyanov avatar
za flag
I would never trust the disk self verdict. Please, consider `smarctl -A`. Also `blkid`, to check if it sees any structure in `/dev/sda2` and `/dev/sdb1` (now unused, but looking like those shoud be second legs of RAID1-s).
ye flag
Added the blkid -o list output. Sorry misunderstood the instruction. I will update soon.
Score:1
za flag

It seems both of your disks are dying:

/dev/sda:
  4 Start_Stop_Count        0x0032   096   096   020    Old_age   Always       -       5039
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       240
187 Reported_Uncorrect      0x0032   079   079   000    Old_age   Always       -       21
195 Hardware_ECC_Recovered  0x001a   044   015   000    Old_age   Always       -       26908616

/dev/sdb:
  4 Start_Stop_Count        0x0012   099   099   000    Old_age   Always       -       4911
  5 Reallocated_Sector_Ct   0x0033   088   088   005    Pre-fail  Always       -       90
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       114
197 Current_Pending_Sector  0x0022   001   001   000    Old_age   Always       -       9640

So, again, never trust to what it says about itself, it lies!

You need to connect a third disk, partition it and add it into your RAIDs. Wait until it finishes rebuild. Intstall bootloader there. Then remove those two failed, and connect fourth one and replicate again to restore redundancy.

And setup periodic check and monitoring, to avoid such a dangerous situation in the future.


It is surprising to see separate boot RAID array with LVM on it. Very unusual. The original purpose of separate boot partition is to not to put it inside LVM so it could be accessed easier (early bootloaders did not know nothing about LVM, so that was a requirement).

ye flag
Ok, thank you for your input. I'll update once I get done with your suggested steps.
ye flag
Put in a new drive, copied the partition table from one of the other drives using sfdisk and then added the newly created partitions to both md0 and md1 and the recovery was completed. Now, waiting for the next new drive to complete the steps. Thank you!
ye flag
Out of curiosity, can I try to add the removed partitions /dev/sda2 and /dev/sdc1 (formerly /dev/sdb1) to the raid array? What would happen?
Nikita Kipriyanov avatar
za flag
You can play with old hard disks as much as you wish. The only thing I advise is not to play with "production" array just out of curiosity. For that better form a new array from these old disks and stress it to see how it would perform under load. Even if shows errors or dies, your data will be safe on new disks.
ye flag
Thank you for your guiding me.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.