Score:1

Ubuntu 22.04: Rebuilding a failed RAID 1 OS Boot drive

cc flag

I'm very new in the Linux world and I find it very appealing so I decided to buy a rackable server to use it to learn more (it's a very old Dell R710 which i found very cheap but I find it perfect for what my needs rather than going to a junkyard).

Before actually starting to use it, I'm doing a RAID 1 test on virtualbox. My plan is to have two SSDs in the server setup as RAID 1 that will have Ubuntu installed on them.

I found various guides on how to setup a raid 1 on two drives that are also os/boot drives but i didn't have much luck in finding what to do in case one of them fails, respectively how to rebuild a failed RAID 1 drive that's also an os/boot drive.

In virtualbox, i added two 10GB drives and in Ubuntu server installed, i selected:

  • Custom storage layout
  • Drive 1 > Use as boot device
  • Drive 1 free space > add GPT partition > format [leave unformatted] & mount [/]
  • Drive 2 > Add as another boot device
  • Drive 2 free space > add GPT partition > format [leave unformatred] & mount [/]
  • Create software raid > RAID level 1 / added both partitions

After installation, i can see this:

root@vm:~# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sdb2[0] sda2[1]
      10473472 blocks super 1.2 [2/2] [UU]

unused devices: <none>


root@vm:~# lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
loop0     7:0    0 111.9M  1 loop  /snap/lxd/24322
loop1     7:1    0  63.3M  1 loop  /snap/core20/1822
loop2     7:2    0  49.8M  1 loop  /snap/snapd/18357
sda       8:0    0    10G  0 disk
├─sda1    8:1    0     1M  0 part
└─sda2    8:2    0    10G  0 part
  └─md0   9:0    0    10G  0 raid1 /
sdb       8:16   0    10G  0 disk
├─sdb1    8:17   0     1M  0 part
└─sdb2    8:18   0    10G  0 part
  └─md0   9:0    0    10G  0 raid1 /
sr0      11:0    1  1024M  0 rom

Now i shut down the vm, removed a disk vdi and created a new one, again of 10 GB. This will simulate the failure of one of the SDDs and replacing it with a new, empty one.

After booting again, i got this:

root@vm:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda2[1]
      10473472 blocks super 1.2 [2/1] [_U]

unused devices: <none>

So what i did next is:

root@vm:~# sfdisk -d /dev/sda | sfdisk /dev/sdb

/dev/sdb1: Created a new partition 1 of type 'BIOS boot' and of size 1 MiB.
/dev/sdb2: Created a new partition 2 of type 'Linux filesystem' and of size 10 GiB.
root@vm:~# fdisk -l

Disk /dev/sda: 10 GiB, 10737418240 bytes, 20971520 sectors
Disk model: VBOX HARDDISK
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 087509D7-06F6-4346-BA97-7A0E0D303D9D

Device     Start      End  Sectors Size Type
/dev/sda1   2048     4095     2048   1M BIOS boot
/dev/sda2   4096 20969471 20965376  10G Linux filesystem


Disk /dev/sdb: 10 GiB, 10737418240 bytes, 20971520 sectors
Disk model: VBOX HARDDISK
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 087509D7-06F6-4346-BA97-7A0E0D303D9D

Device     Start      End  Sectors Size Type
/dev/sdb1   2048     4095     2048   1M BIOS boot
/dev/sdb2   4096 20969471 20965376  10G Linux filesystem

So they appear to be identical now so next, i did:

root@vm:~# mdadm --manage /dev/md0 --add /dev/sdb2
root@vm:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sdb2[2] sda2[1]
      10473472 blocks super 1.2 [2/1] [_U]
      [================>....]  recovery = 81.2% (8508864/10473472) finish=0.1min speed=202005K/sec

unused devices: <none>

Now, lsblk looks exactly like it did when i started.

Next, I tried rebooting and going in virtualbox boot menu (F12). Here, choosing disk 1, boots just fine but choosing disk 2 complains about not having a boot loader. So i booted from disk 1 and did:

root@vm:~# sudo grub-install /dev/sdb
root@vm:~# sudo update-grub /dev/sdb

And now it seems to be booting off either drive. I also created a file before replacing the drive and this file now shows up regardless of which drive i boot from.

So it kinda works, but is this the correct way to do it?

Am i missing something that could go wrong once i setup the actual SSDs in the server?

Thank you very much and sorry for the long post but maybe someone else will find it helpful.

F. Hauri avatar
cn flag
1. Why did you `fdisk` your sdb? Hoe do they look before? 2. you have to `install-mbr` on `sdb` for being able to boot on sdb.
paladin avatar
id flag
Might be offtopic, but I recommend to use BTRFS as filesystem and use its built-in RAID function. BTRFS is able to do online RAID rebuild, even for the root partition. What you always should do, install the bootloader either on both disks or use a boot device like an USB thumb drive.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.