Score:0

How can I make LVM configs boot tolerant with attached USB drive

gi flag

Setup: DELL PowerEdge R520, oVirt Node 4.4.1 x86_64

# pvs
  PV         VG          Fmt  Attr PSize    PFree   
  /dev/sda2  onn_ovirt01 lvm2 a--   105.26g   20.77g
  /dev/sda3  VG_he_nfs   lvm2 a--  <100.00g  <10.00g
  /dev/sda4  VG_data_nfs lvm2 a--    <1.50t <206.00g

# lsblk
...

sdb                                                            8:16   0   1.4T  0 disk 
└─sdb1                                                         8:17   0   1.4T  0 part /exports/nfs/backups

Problem: When the system reboots, the 1.4T backup drive connected with sata-to-usb becomes sda, where lvm doesn't find the necessary partitions for the physical volumes. The system then boots into rescue mode where I have to log in via attached monitor/keyboard, unmount and eject the sata-to-usb drive, comment it's entry out of the fstab, unplug it, and reboot the system. Then, once properly booted with the correct device as sda, I have to undo everything I did in rescue mode with the sata-to-usb device.

Everything is the fstab is already defined to mount by UUID or /dev/mapper/.

The question: Is it possible to change the LVM configuration so it gets the right physical volume for the system regardless of which device becomes sda? Is it possible without recreation and migration (I have the system data on a RAID 1 (mirroring) with hot-spare, so no more room in the chassis for a replacement drive arrangement)? I'm open to any solution that does not require deleting data or creating a new RAID arrangement for replacement. If that's not possible, then I'm open to anything, really - or will just continue with sorting it out in rescue mode every time it unexpectedly reboots.

Score:0
za flag
  1. LVM doesn't store device paths. Component UUIDs are stored into the LVM superblocks and these UUIDs exclusively used to identify components (PVs, VGs, LVs). LVM just scans all available block devices (which ones are permitted to scan is configured in the /etc/lvm/lvm.conf), detects physical volumes and assembles volume groups from them. It just doesn't look on what type/device path the physical volume has this time. It is very robust to device reindexing and so on. So it will find your data if you move your volumes to /dev/cciss/cXdYpZ (old HP/Compaq SmartArray block driver creates such devices) or to /dev/hdXY or /dev/sdXY or /dev/mapper/... (anything built upon DM places device nodes there — crypto, multipath, etc.), /dev/md/... (Linux MD RAID) and so on. Your concern is wrong and your problem is elsewhere.

  2. The cause of your problem might be the slowness of USB. It has large latencies; also external hard drives are very slow to start (this is done to limit their power usage spike during spin-up). USB is not about performance, but about robustness in the hands of inexperienced user. So, it is slow to initialize. You need to configure your init scripts (likely, initramfs init script) to allow large delays/timeouts so USB devices have enough time to spin up and settle.

  3. Another typical cause is the bad configuration of the bootloader; for example, it could expect to find its data on the "first partition of the first hard drive", and if the "first hard drive" happens to be wrong device, it doesn't have a configuration and the kernel image to boot and throws you into the bootloader rescue shell. The command line of the kernel or something which is put into the initramfs may be tied to some concrete device path and therefore the swapping of devices causes it to not to able to find / and it throws you into initramfs rescue shell. Notice these are different rescue shells, and the understanding of which one do you see is important.

  4. RAID0 with spare is an oxymoron. RAID0 has no redundancy, no defined degraded state, nothing to recover the array into optimum state from a device failure, so spare could not possibly help it. Any component device problem generally results in moving a whole array directly into failed state. Any other RAID level has a redundancy, it will transition into degraded state first in the event of a component failure, so it will benefit from spares, but RAID0 does not.

JayRugMan avatar
gi flag
I got RAID 0 and 1 confused. My drives are mirrored, so RAID 1 (I corrected it above). At any rate, what you say about RAID 0 makes sense. Thanks for prompting me to correct that detail in my question. Regarding the rest, All I know is that when I check lsblk in rescue mode, the USB drive is sda and the LVM doesn't find the system data on sdb, which is where the internal, RAID 1 device is mapped when booted with the USB-to-SATA connected. How does a slow USB beat the internal device to becoming sda, and why would this matter, base on everything else you wrote?
JayRugMan avatar
gi flag
I'm not able to try this now, but it seem that if I add sdb partitions to the LVM filter, it may catch the system data on boot, even it sda is mapped to my usb backup drive. Right now, only sda is in specified in the filter in the lvm configs. When I get home, I'll try that.
Nikita Kipriyanov avatar
za flag
It is very strange to only have sda in the filter. SCSI scanning is asynchronous, devices reorder, this is normal and expected; it is *pointless* to only permit scanning some sdX devices and forbid scanning of others. LVM doesn't do this by default, so it must be your custom setting.
JayRugMan avatar
gi flag
Maybe it was me that did it, I don't know (must have been, I guess). I don't remember doing that. At any rate, I added the sdb partitions to the filter and it worked - they get scanned and the system data is found just fine by LVM (see the answer I posted).
Nikita Kipriyanov avatar
za flag
It was already wrong idea to touch it in the first place. Just revert it to original package defaults.
Score:0
gi flag

I solved the issue. All I had to do was add sdb partitions to the filter in /etc/lvm/lvm.conf:

Was:

filter = ["a|^/dev/sda2$|", "a|^/dev/sda3$|", "a|^/dev/sda4$|", "r|.*|"]

Changed to:

filter = ["a|^/dev/sda2$|", "a|^/dev/sda3$|", "a|^/dev/sda4$|", "a|^/dev/sdb2$|", "a|^/dev/sdb3$|", "a|^/dev/sdb4$|", "r|.*|"]

Anyone else trying this, make sure you verify your changes and regenerate the cache with vgscan

my first go (forgot the | after the $):

[root@host lvm]# vgscan
  Invalid separator at end of regex.
  Invalid filter pattern "a|^/dev/sdb2$".
  Failed to create regex device filter

my second go:

[root@host lvm]# vgscan
  Found volume group "VG_data_nfs" using metadata type lvm2
  Found volume group "VG_he_nfs" using metadata type lvm2
  Found volume group "onn_ovirt01" using metadata type lvm2

The sata-to-usb drive still shows up as sda, but it doesn't matter - LVM skips on to sdb partitions when nothing is found on sda. I did have to mount the sata-to-usb drive manually, but because it's in /etc/fstab correctly, I only had to issue mount -a. I'll have to sort this issue out later and take this win for now.

Nikita Kipriyanov avatar
za flag
This is bad solution. Better add all `/dev/sd*` devices to be scanned. It is wrong idea to filter by device index, as I explained in my answer. What if you add another USB drive or another RAID logical drive? The best is to revert `lvm.conf` to distribution defaults; believe me, **you don't really have any valid need to touch that**.
JayRugMan avatar
gi flag
So you're saying if I leave out a filter altogether, it will work? I'll try it when I get a chance. At any rate, why so adamant about about not using functionality provided by lvm? This server is mine and I know what will be plugged into it and what won't. I'm not worried in the slightest about this messing _anything_ up.
Nikita Kipriyanov avatar
za flag
I suggest to leave a file as when it was installed by the package manager. In 20+ years working with Linux, I had only one supposedly valid case to touch that, in cluster environment. I appreciate functionality provided by LVM. I also stand for the educated use of configuration parameters for their intended purpose. Did you had any purpose? You really wanted to restrict scanning for LVM volumes to only sda, despite the fact sd devices tend to get renamed? So it caused your "problem". Also keep track of your changes and consider reverting that first when something goes wrong.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.