Score:1

CentOS 7 (dracut) is finding inconsistent network device names causing problems for kickstart

id flag

I use the boot options biosdevname=1 net.ifnames=1 in order to get consistent, predictable device names. I'm starting to notice a problem where in some cases, the network device names are not consistent. For example, if I drop to a dracut debug shell and look at the output of rdsosreport.txt, I see this:

+ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: p3p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether a8:b4:56:50:97:08 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether a8:b4:56:50:97:09 brd ff:ff:ff:ff:ff:ff

Notice that there is a mix of consistent (p3p1) and legacy style (eth1) naming. However, if I look at the the interfaces from the dracut debug shell, I see this:

initqueue:/run/initramfs# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: p3p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether a8:b4:56:50:97:08 brd ff:ff:ff:ff:ff:ff
3: p3p2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether a8:b4:56:50:97:09 brd ff:ff:ff:ff:ff:ff

p3p1/p3p2 are the correct expected names. For some reason, early in the initrd sequence, they are coming up in the mixed format. My assumption is that there is some sort of race going on here and given a bit more time, it (udev?) settles into the correct state, but I'm not sure exactly where it is. Unfortunately, this is causing problems for some of our automated server builds, because servers are coming up after (postinstall) first-boot and trying to bring up eth1 when the real interface name is p3p2.

I've been digging through the dracut modules to try and figure out where the problem may lie, but haven't been able to determine it conclusively yet, so looking for suggestions.

Also, this behavior doesn't happen all the time. The same server, booting the same image sometimes works fine, and other times gets this mixed naming behavior. Which also sort of tells me this is some kind of race - sometimes the race is won, and sometimes it is lost.

Michael Hampton avatar
cz flag
Are the names correct when the installed system boots?
guzzijason avatar
id flag
@MichaelHampton no. The installed system will end up with config files `ifcfg-p3p1` and `ifcfg-eth1`, but eth1 will not exist. So, to get the installed system working normally, I need to edit the network configs by-hand. Also, I just appended to my original post - the behavior is inconsistent, meaning sometimes the problem happens, and sometimes it doesn't. Which tells me, race.
Michael Hampton avatar
cz flag
It seems that by the time you get past the initramfs and into the installer, the device names should have long since settled, far in advance of anything you might be doing in the kickstart. Can you be specific about what you are doing that results in this failed network configuration?
guzzijason avatar
id flag
Yeah, I use a custom-built initramfs, that dynamically configures LACP bonding. We have some single-homed hosts, some dual-homed, and some with multiple interfaces that get aggregated into a single bond. It "usually" works, except for when this problem occurs.
Michael Hampton avatar
cz flag
This custom stuff is written as a dracut module? These should load long after the udev names have settled. Also be aware that dracut can set up bonding itself given a command line option to do so.
guzzijason avatar
id flag
We've made modifications to the `40network` dracut modules for our custom needs.
guzzijason avatar
id flag
I think the race problem may be due to biosdevname. So far, I haven't been able to reproduce the problem when using `biosdevname=0 net.ifnames=1`, which might be an option for us.
Score:0
id flag

Answering my own question here. It turns out, the problem was (partially) self-inflicted.

The part we can't control:

Using boot option biosdevname=1 has the potential to cause races during the interface renaming phase. If you can live without it, simply using net.ifnames=1 biosdevname=0 might be preferable, even if the resulting names are "less pretty".

The part we CAN control:

Our site uses a custom modified dracut 40network module. One of the main things our version does is that it probes the contents of /sys/class/net/ looking for viable interfaces to automatically add to a bond. (we don't always know the device names in advance, which is why the module needed some logic to identify them on its own). The race mentioned above can cause a delay in the renaming of files in /sys/class/net/. The solution was simple: add a 5 second sleep to the script prior to probing /sys/class/net/. This gives biosdevname (hopefully more than enough) time to finish renaming devices. Testing so far seems A-OK.

Michael Hampton avatar
cz flag
Interesting. What names do you get with `biosdevname=0`? It's been probably a year since I upgraded the last CentOS 7 box I had (to 8).
guzzijason avatar
id flag
With `net.ifnames=1 biosdevname=1`, we get names like "p3p1", "p3p2" (slot and port, pretty simple). With `net.ifnames=1 biosdevname=0`, those names change to things like "enp193s0f0", "enp193s0f1".
Michael Hampton avatar
cz flag
Ah, so the normal names come back when you use `biosdevname=0`. Probably best to switch back to that as it's what everyone else is using. Those old names were largely abandoned years ago.
guzzijason avatar
id flag
I know the legacy "eth0", "eth1" names were abandoned, for good reason. `biosdevname=1` is actually the default behavior on Dell hardware, unless explicitly disabled.
Michael Hampton avatar
cz flag
Sorry, I meant the Dell `biosdevname=1` names were abandoned. With this set to 0, device names use the same general formats across all hardware vendors.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.