Score:0

Centos 7 dhcp server not responding to UEFI PXE DISCOVER

ml flag

I'm trying to set up a Centos 7 server as a dhcp server for PXE (UEFI). I'be tried several changes to the dhcpd.conf file, but nothing seems to make a difference.

dhcpd.conf:

allow booting;
allow bootp;

max-lease-time 120;
default-lease-time 120;

option domain-name "domain.tld";
option domain-name-servers 192.168.1.9, 192.168.1.10;

option space pxe;
option pxe.magic code 208 = string;
option pxe.configfile code 209 = text;
option pxe.pathprefix code 210 = text;
option pxe.reboottime code 211 = unsigned integer 32;

option pxe.mtftp-ip code 1 = ip-address;
option pxe.mtftp-cport code 2 = unsigned integer 16;
option pxe.mtftp-sport code 3 = unsigned integer 16;
option pxe.mtftp-tmout code 4 = unsigned integer 8;
option pxe.mtftp-delay code 5 = unsigned integer 8;
option pxe.discovery-control code 6 = unsigned integer 8;
option pxe.discovery-mcast-addr code 7 = ip-address;


option architecture-type code 93 = unsigned integer 16;

class "pxe" {
  match if substring (option vendor-class-identifier, 0, 9) = "PXEClient";
  option vendor-class-identifier "PXEClient";
  vendor-option-space pxe;
  option pxe.mtftp-ip 0.0.0.0;

  if option architecture-type = 00:07 {
    filename "shim.efi";
  } else {
    filename "pxelinux/pxelinux.0";
  }
}

subnet 192.168.1.0 netmask 255.255.255.0 {
  not authoritative;
}

# PXE Network
########################################################################
subnet 172.16.10.0 netmask 255.255.255.0 {
  authoritative;
  allow unknown-clients;
  next-server 172.16.10.3;
  option routers 172.16.10.1;
  option broadcast-address 172.16.10.255;
  pool {
    range dynamic-bootp 172.16.10.10 172.16.10.49;
    allow members of "pxe";
  }
  pool {
    range 172.16.10.50 172.16.10.99;
    allow members of "pxe";
  }
  pool {
    range 172.16.10.100 172.16.10.149;
  }
}

host dev2 {
  hardware ethernet ec:f4:bb:d8:59:9f;
  option host-name "dev2.domain.tld";
}


host dev1 {
  hardware ethernet ec:f4:bb:bf:c8:e7;
  option host-name "dev1.domain.tld";
}

I tried running the server manually to make sure I saw any logs, but this is all that comes out:

[root@kickstart dhcp]# /usr/sbin/dhcpd -f -cf /etc/dhcp/dhcpd.conf -user dhcpd -group dhcpd --no-pid -4 -d eth1
Internet Systems Consortium DHCP Server 4.2.5
Copyright 2004-2013 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Not searching LDAP since ldap-server, ldap-port and ldap-base-dn were not specified in the config file
Wrote 0 class decls to leases file.
Wrote 0 deleted host decls to leases file.
Wrote 0 new dynamic host decls to leases file.
Wrote 0 leases to leases file.
Listening on LPF/eth1/52:54:00:fa:4d:fc/172.16.10.0/24
Sending on   LPF/eth1/52:54:00:fa:4d:fc/172.16.10.0/24
Sending on   Socket/fallback/fallback-net

I also ran a packet trace on the server. I see the DHCP DISCOVER packet come in, but there is never a response.

<bash>$tcpdump -vvvvvvvvvvvvvvvvvvvvv -ttttt -i eth1

 00:37:05.338983 IP (tos 0x0, ttl 64, id 43032, offset 0, flags [none], proto UDP (17), length 375)
    0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request from ec:f4:bb:d8:59:9f (oui Unknown), length 347, xid 0x777a345e, secs 12, Flags [Broadcast] (0x8000)
      Client-Ethernet-Address ec:f4:bb:d8:59:9f (oui Unknown)
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Discover
        MSZ Option 57, length 2: 1464
        Parameter-Request Option 55, length 35: 
          Subnet-Mask, Time-Zone, Default-Gateway, Time-Server
          IEN-Name-Server, Domain-Name-Server, Hostname, BS
          Domain-Name, RP, EP, RSZ
          TTL, BR, YD, YS
          NTP, Vendor-Option, Requested-IP, Lease-Time
          Server-ID, RN, RB, Vendor-Class
          TFTP, BF, GUID, Option 128
          Option 129, Option 130, Option 131, Option 132
          Option 133, Option 134, Option 135
        GUID Option 97, length 17: 0.68.69.76.76.84.0.16.57.128.75.180.192.79.67.52.50
        NDI Option 94, length 3: 1.3.16
        ARCH Option 93, length 2: 7
        Vendor-Class Option 60, length 32: "PXEClient:Arch:00007:UNDI:003016"
        END Option 255, length 0

Some other system info:

<bash> $ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:59:e9:5d brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.203/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:fa:4d:fc brd ff:ff:ff:ff:ff:ff
    inet 172.16.10.3/24 brd 172.16.10.255 scope global eth1
       valid_lft forever preferred_lft forever


<bash>$ sestatus 
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   permissive
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28

<bash>$ firewall-cmd --state
not running

<bash>$ netstat -nap | grep dhcp
udp        0      0 0.0.0.0:67              0.0.0.0:*                           21050/dhcpd         
udp        0      0 0.0.0.0:67              0.0.0.0:*                           17697/dhcpd         
udp        0      0 0.0.0.0:67              0.0.0.0:*                           15042/dhcpd         
raw        0      0 0.0.0.0:1               0.0.0.0:*               7           21050/dhcpd         
raw        0      0 0.0.0.0:1               0.0.0.0:*               7           17697/dhcpd         
raw        0      0 0.0.0.0:1               0.0.0.0:*               7           15042/dhcpd         
unix  2      [ ]         DGRAM                    94586    15042/dhcpd          
unix  2      [ ]         DGRAM                    107361   17697/dhcpd          
unix  2      [ ]         DGRAM                    110207   21050/dhcpd     


<bash>$ iptables-save 
<bash>$ 

I'm not sure if/how this would matter, but the PXE server is a KVM/QEMU running on a Centos 7 hypervisor. On the host, em1 is joined to br1, em2 to br2, em3 to br3, em4 to br4. Each NIC is attached to a switch on it's own VLAN. The VM has eth0 linked to br1 and eth1 to br4.

The PXE client is a physical server. There are multiple switches between this PXE client and the dhcp server.

Update:

(config above updated):

I configured a standard linux client on the network, and it was able to get a lease. So, it appears to be something about the UEFI PXE client. Here is pcap of a single request: https://pastebin.com/hp6n1ExR (base64 encoded)

pt flag
This configuration looks fine. I was able to copy and paste your `dhcpd.conf` into a test environment and successfully PXE boot a system without any changes (using a CentOS 7 server). I know you've checked `firewall-cmd`, but are there any other iptables rules in place that are managed by firewalld? `iptables-save` would show you everything.
Justin Killen avatar
ml flag
What confuses me is that there's no DHCPDISCOVER entries in syslog. FWIW, I added an empty subnet on eth0 (an active network), and I see entries e.g.: Jan 28 12:40:50 servername dhcpd: DHCPDISCOVER from e8:d8:d1:bf:7b:72 via eth0: network 192.168.1.0/24: no free leases
pt flag
Forget DHCP for a moment: if you *manually* configure an ip address on a client system, is it able to communicate with the DHCP server on the 172.16.10.0 network?
Justin Killen avatar
ml flag
@larsks The DHCP DISCOVER packet is coming through from the client to the server, so I'm going to say Yes. It's possible that the return path might not work, but according to the packet dump (and the lack of logs in syslog), there isn't any outgoing traffic to test.
pt flag
Just because something shows up in tcpdump doesn't necessarily mean it's going to get delivered anywhere useful. I'd like to know if you're able to reach *any* service on that machine from clients on that network; the answer to that question might suggest things to look at. If you configure an ip manually on a client, can you ssh to that server? Or access a web service running on it?
Justin Killen avatar
ml flag
@larsks I was able to ping and ssh from another host in the 172.16.10 network.
Justin Killen avatar
ml flag
@larsks From that client that had a working ping/ssh, I ran `dhclient -d -nw -v` and in the server logs I see `Jan 30 08:51:04 server dhcpd: DHCPDISCOVER from ec:f4:bb:bf:c8:e7 via eth1: network 172.16.10.0/24: no free leases`. So that's progress. I'll look into why there are no free leases.
Justin Killen avatar
ml flag
Question updated: TLDR: I was able to get a standard lease, so my problem seems to be related to UEFI PXE
pt flag
From where did you gather that `tcpdump`?
Justin Killen avatar
ml flag
@larsks The tcp dump was taken on the dhcp server.
pt flag
Interesting. I asked because the packet has a VLAN tag (see the packet decoded with tshark [here](https://gist.github.com/larsks/2a4f84eec6712a3b8b6812b46e7b9a81)), which we wouldn't expect to see given what you've shown of your network configuration. The vlan tag means that packet isn't going to get delivered anywhere useful (unless you have an interface on your DHCP server configuration explicitly for VLAN 900). I think you have a misconfiguration in your network somewhere.
Justin Killen avatar
ml flag
I downloaded the ISC code from https://downloads.isc.org/isc/dhcp/4.2.5/dhcp-4.2.5.tar.gz and ran configure without any options. I started it with the same parameters as the system service and it complained about the -user and -group options, so I left them off. I also had to add an explicit -lf option to point to the leases file. With that binary, I do see the DHCPDISCOVER log entry and it responds with a DHCPOFFER. The VLAN is expected - eth1 on the server is on the same VLAN.
pt flag
If you don't have an explicit VLAN interface on your system (like `eth1.900`), you really wouldn't expect to see the VLAN tag: you would expect the switch port to have the VLAN configured as an untagged VLAN on that port, with the tag being stripped before the packet is emitted on the port. Erroneously sending tagged packets to an interface that is not expecting them would cause the behavior you've described. You obviously have more knowledge of your specific network config than I do, so if you're confident that's not the issue, I'm out of ideas.
Justin Killen avatar
ml flag
@larsks your comment was spot-on. I know enough about VLANS to be dangerous but I'm no expert. I added an eth1.900 interface to test, and it now I'm seeing DHCPREQUEST and DHCPACK packets. I'll chat with the person that set up the VLAN and have them update the config on the switch.
Score:0
pt flag

I'm going to write this up as an answer in case other folks run into a similar problem.

First, from your question, you have the following network configuration:

On the host, em1 is joined to br1, em2 to br2, em3 to br3, em4 to br4. Each NIC is attached to a switch on it's own VLAN. The VM has eth0 linked to br1 and eth1 to br4.

Significantly, these are all "regular" -- not VLAN -- interfaces. They don't expect incoming ethernet frames to have any VLAN tags. On the other hand, we see from your packet capture that incoming frames are tagged with VLAN 900:

$ tshark -n -r packets
.
.
.
Ethernet II, Src: Dell_d8:59:9f (ec:f4:bb:d8:59:9f), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
    Destination: Broadcast (ff:ff:ff:ff:ff:ff)
        Address: Broadcast (ff:ff:ff:ff:ff:ff)
        .... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
        .... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
    Source: Dell_d8:59:9f (ec:f4:bb:d8:59:9f)
        Address: Dell_d8:59:9f (ec:f4:bb:d8:59:9f)
        .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
        .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
    Type: 802.1Q Virtual LAN (0x8100)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 900
    000. .... .... .... = Priority: Best Effort (default) (0)
    ...0 .... .... .... = DEI: Ineligible
    .... 0011 1000 0100 = ID: 900
    Type: IPv4 (0x0800)
.
.
.

That suggests that your switch is misconfigured (or your host is, depending on how we look at things): we would expect the ports to be configured as an access port -- that is, a port that delivers untagged packets from a specific VLAN to your host.

Unfortunately, it looks as if the port is configured as a trunk port -- that is, a port that can deliver multiple VLANs to your host over a single physical connection.

If your host is configured to expect an access port, but ethernet frames are being delivered with a VLAN tag, those frames will effectively be "lost" by your host.

You can either configure a VLAN port on your system:

ip link add link eth1 name eth1.900 type vlan id 900

Or you can configure your switch ports as access ports, the instructions for which vary by switch.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.