Score:1

How run a dnsmasq inside QEMU, providing netboot service to other VMs

ph flag

EDIT: WIP: The core reason for the failures explained below is due to me not bringing up the host TAP interfaces at the right time, if I allow QEMU to handle the creation of the tap devices, everything works as expected. I will investigate the failure in more detail and provide a clearer explanation of the problem when I have it. Thank you @anx for the tips!

Goal: Run a dnsmasq inside a host QEMU VM, that services netboots from another QEMU VM running on the host.

I would like the dnsmasq VM to act like a gateway, with one NIC as the upstream WAN interface, with an upstream DHCP server, and the other interface a private LAN interface, to which other VMs will be "plugged", and will netboot from the dnsmasq listening on this private LAN interface.

First, to allow the VMs to talk to one another, I create my own bridge on the host,

ip link add name vivianbr0 type bridge
ip link set vivianbr0 up

For the VMs to talk to each other via the host bridge, I will need two tap devices, one for the private LAN interface on the gateway VM, and another for the private VMs single network interface,

ip tuntap add mode tap tap0 user cturner
ip tuntap add mode tap tap1 user cturner
ip link set tap0 up
ip link set tap1 up
ip link set tap0 master vivianbr0
ip link set tap1 master vivianbr0

For the gateway VM, I am using an Arch Linux ISO for testing purposes, the VM is booted with two NICs, thusly,

 qemu-system-x86_64 \
    -drive file=arch-disk.qcow2,if=none,id=nvm \
    -device nvme,serial=deadbeef,drive=nvm \
    -cdrom archlinux-2021.09.01-x86_64.iso \
    -boot d \
    -device virtio-net-pci,romfile=,netdev=net0,mac="DE:AD:BE:EF:00:11" \
    -device virtio-net-pci,romfile=,netdev=net1,mac="DE:AD:BE:EF:00:12" \
    `# Simulate the plugged in "upstream" cable with user-mode networking` \
    -netdev user,id=net0,hostfwd=tcp::60022-:22,hostfwd=tcp::8080-:80,hostfwd=tcp::8081-:8000,hostfwd=tcp::2375-:2375 \
    `# And now the unplugged one with, with TAP networks` \
    -netdev tap,id=net1,ifname=tap0,script=no,downscript=no \
-net bridge,br=vivianbr0 \
    -m 4G \
    -enable-kvm

Once this machine has booted, I see the following in the bridge configuration,

brctl show vivianbr0 

bridge name     bridge id               STP enabled     interfaces
vivianbr0               8000.46954a1ad851       no              tap0
                            tap1
                            tap2

I assume tap2 was created by QEMU...

Inside this VM, there are two ifaces. ens4 with MAC DE:AD:BE:EF:00:11, and ens5 with MAC DE:AD:BE:EF:00:12. Inside this VM, I start dnsmasq,

ip addr add 10.42.0.1/24 dev ens5
dnsmasq -d --dhcp-range=10.42.0.10,10.42.0.100 --dhcp-script=/bin/echo --enable-tftp=ens5 --interface=ens5

This starts wtihout error.

Now I try to netboot another VM, started on the host like this,

qemu-system-x86_64 \
-machine pc-q35-6.0,accel=kvm \
-m 1024 -smp 2,sockets=2,cores=1,threads=1 \
-netdev tap,id=net0,ifname=tap1,script=no,downscript=no \
-device virtio-net-pci,netdev=net0,bootindex=1,mac=DE:AD:BE:EF:00:13 \
-net bridge,br=vivianbr0 \
-enable-kvm \
-vga virtio

But it fails to boot. I monitor the vivianbr0 using tcpdump and can see the DHCP requests, but there are no responses, nothing reaches the dnsmasq running inside the first VM,

tcpdump -i vivianbr0 -nN
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vivianbr0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
12:21:39.585229 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from de:ad:be:ef:00:13, length 397
12:21:40.587741 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from de:ad:be:ef:00:13, length 397
12:21:40.700038 IP6 fe80::6ce2:2aff:fe94:ba48.5353 > ff02::fb.5353: 0 [7q] PTR (QM)? _nfs._tcp.local. PTR (QM)? _ftp._tcp.local. PTR (QM)? _webdav._tcp.local. PTR (QM)? _webdavs._tcp.local. PTR (QM)? _sftp-ssh._tcp.local. PTR (QM)? _smb._tcp.local. PTR (QM)? _afpovertcp._tcp.local. (118)
12:21:42.619968 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from de:ad:be:ef:00:13, length 397
12:21:46.684448 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from de:ad:be:ef:00:13, length 397
12:22:30.609555 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from de:ad:be:ef:00:12, length 289
12:23:33.796148 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from de:ad:be:ef:00:12, length 289
12:24:38.673364 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from de:ad:be:ef:00:12, length 289

Oddly, I see BOOTP requests from de:ad:be:ef:00:13 (the netbooting VMs MAC addr) and from de:ad:be:ef:00:12 (the gateway VM's private NIC), indicating something is badly misconfigured.

How can I make this work?

Nikita Kipriyanov avatar
za flag
What if you listen for DHCP traffic on tap0 (bridge port)? It should show the thaffic which bridge code switched to that port. Also, check bridge MAC address table during the requests; does it fills up with required MACs? By the way, bridge by default sets standard STP delays, which means the port starts forwarding traffic only 30 seconds later than it was torn "up". In your bridge setup I've seen nowhere you change this or disable STP. And, finally, the VM which is supposed to boot from dnsmasq VM: which bridge port it uses? There must be another tap interface inside bridge, for a second VM.
ph flag
@anx Regarding `-netdev brdige`, how does this answer the question? It allows me to attach tap devices to a *differently named* bridge, but the problem is that I need QEMU to proxy a guest device to a host bridge, so that the dnsmasq inside the VM can service requests handled by the host bridge.
ph flag
@NikitaKipriyanov I answered your first question in an edit. STP is not running on the bridge. I don't know how to check bridge MAC address table during the requests. It seems a simpler issue of interfaces not being able to be brought UP.
ph flag
@anx Regarding the `dnsmasq` invocation. It's bit complicated to breakdown, but my gateway VM names the two interface based on this rule: whichever interface responds to a DHCP request, is named `public`, the other `private`. In a production deployment, the public interface is connected to an upstream DHCP server, the private interface not. Then the `dnsmasq` command is as I wrote in my question, but with `--interface=private`
anx avatar
fr flag
anx
I like that you question mentions all the steps attempted, but I suspect the key facts are somewhat obscured by different versions.. care to add a consistent snapshot of the current situation to your question (both qemu cmdlines, the dnsmasq cmdline, and all 3 outputs from `ip a l`)?
ph flag
@anx Yes, I really do need to manage the tap devices myself. I've been trying to be as brief as possible, but not doing great there, so suffice it to say... There are reasons :)
ph flag
@anx The "dnsmasq on the guest" simply can't work, for the same reason dnsmasq on `tap0` rather than `br0` does not work. This was a minimized example per your request. If I start a dnsmasq inside a QEMU VM with `-netdev tap,id=net1,ifname=tap0,script=no,downscript=no`, then it's the same behaviour as the example I gave in my question, only there's an extra level of indirect between the guest's network iface, and the host tap.
Tom Yan avatar
in flag
You are not really making any sense. Binding dnsmasq to a tap on the host (which makes no sense is expected NOT to work anyway) has NOTHING to do with having a VM running dnsmasq (that binds to the virtualized NIC inside the VM), regardless of whether you can have the latter working. There's no point testing / cross-checking / troubleshooting with your "simplified" approach, as it is NOT a simplification at all.
Tom Yan avatar
in flag
The first thing comes to my mind is that, qemu does not enumerate or randomize the MAC address of each VM. It's always `52:54:00:12:34:56` if not explicitly set / changed with the qemu option. You might want to make sure you have that fixed, with the corresponding qemu option or, configure the system of the VMs to configure "fake" ones themselves. Make sure you don't confuse yourself again with the MAC addresses of the taps on the host.
Tom Yan avatar
in flag
By the way, you really want to clean up your question (by removing all the irrelevant nonsense) if you still want / need further help.
ph flag
@TomYan I've completely rewritten my question. Hopefully it is clearer to you now. I've attempted to ensure the MAC addrs are unique, although perhaps I haven't done this right either.
Tom Yan avatar
in flag
First of all, you don't need to use `-net(dev) bridge` *in addition to* `-net(dev) tap`. They are not complementary to each other, but rather, the former automatically adds a tap for you, and the latter makes use of an existing tap. I recommend using the shortcut `-nic bridge,model=virtio,mac=SO:ME:MA:CA:DD:RE` to replace all other network related options you have now. (`-nic` can also be use with `user` and its `hostfwd btw.)
Tom Yan avatar
in flag
Next, `...and from de:ad:be:ef:00:12 (the gateway VM's private NIC), indicating something is badly misconfigured` that's a false assumption. Whether you see DHCP requests from the "gateway host" depends on whether it runs a DHCP client. The Arch ISO makes use of `systemd-networkd` for that and by default it have DHCP (client) enables on all Ethernet NICs IIRC.
Tom Yan avatar
in flag
Finally, while I'm not familiar with netboot, but why would you expect it to work like "out-of-the-box" as long as there's a dnsmasq / DHCP server host in the network? And regardless, why don't you just start with also booting the Arch ISO to confirm that at least the address assigning part of DHCP works first? See https://wiki.archlinux.org/title/Netboot btw. (Note that the instruction inside has nothing to do with virtualization, so they should all be applied *inside* a VM, not the host.)
Score:0
fr flag
anx

Your steps for two guests are fine, I have just replicated your setup up until the point of leasing addresses. I could hand out IPs from one VM running dnsmasq to one VM running a dhcp client.

Check these:

  • cannot bring up unattached tap devices
    • visible from the DOWN state in ip a l output of host (should say UNKNOWN or UP)
      • if you let qemu create the tap device, the qemu-bridge-helper will bring it up
      • if you use a script=, bring up the device there
      • otherwise, you have to ip link set tapN up some time after vm start
  • MAC addrs need to be unique
    • visible in ether address in ip a l in the guests
    • list learned (non-host) macs, e.g. brctl showmacs br0 should have two entries with a non-zero ageing timer
  • dropping packets via iptables
    • check /proc/sys/net/bridge/bridge-nf-call* and whether br_netfilter module is loaded
    • add a logging rule, for IPv4 something like iptables -A FORWARD -j LOG --log-prefix "forward dropped" before dropping or before a DROP policy on the FORWARD table
    • ignore /sys/class/net/br*/bridge/nf_call_* (I do not know why these can be off when filtering is on)

Other notables I found while testing:

  • Qemu added the vnet_hdr option on my tap devices. That seems reasonable and could have been disabled on the qemu cmdline if so desired.
  • Sometimes my (scope link) route for the bridge would go missing. I have yet to determine how that could even happen.

About your attempt to simplify the testing by binding to the tap device..

dnsmasq will be running inside a QEMU VM

AFAIK persistent tap devices are unusable until actually attached. So you can only meaningfully test either full setup:

  • Do you want to run dnsmasq on the host?
    • then attach it to the bridge device
  • or do you want to run dnsmasq inside a VM?
    • then attach it to the respective network interface inside that VM

--interface=tap0

Hint: Use --bind-interfaces to instruct dnsmasq to switch from merely discarding traffic from other interfaces to actually trying to bind, thus quitting verbosely out when started with unusable settings.

ph flag
My question is all about running a dnsmasq *inside a VM*, the host-dnsmasq examples were only to give you simpler example of the problem. If you know how to have a dnsmasq *inside a VM* offer leases to another VM net-booting, I would be very interested to find out!
ph flag
Just to be even more explicit, I have edited my question again to give an example of the dnsmasq in guest approach I am trying
anx avatar
fr flag
anx
I *thought* I knew how to do it because I have been doing it for many years.. but then I fiddled around and found *three* different trivial reasons why it might fail.
ph flag
Thanks a lot @anx, I've made some forward progress thanks to your notes. It seems my not bringing the manually created tap devices up at the right time is the source of my issues. If I let QEMU handle them, the netbooting does indeed work. I'll update my question and perhaps offer a clearer explanation of the failure once I've dug a bit more into how to properly handle my use-case. Thanks so much, I owe you several beers!
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.