I am gradually getting a Pi 4 cluster up and running, and have hit an issue I have not been able to resolve with google. It is my understanding of the Ubuntu boot process that is lacking really.
I have 5 nodes. Each is running Ubuntu 21.04. The first is a storage node, and runs from an SD card. It then has a couple of SSDs in RAID 1 split into 8 partitions. The first 4 partitions contain the root file systems for each of the other 4 nodes and are exported via NFS. Each node has a directory /srv/nfs/{RpiID}>/ where the partition is mounted. This storage node is also running TFTP and has a boot directory for each of the other 4 nodes at /srv/tftpboot/{RpiID}/. These directories are then bind mounted to /srv/nfs/{RpiID}/boot/firmware/ on the storage server.
NFS:
mark@rgd-strg-01:~$ ls /srv/nfs/ -la
total 24
drwxrwxrwx 6 root root 4096 Sep 27 14:11 .
drwxr-xr-x 5 root root 4096 Sep 28 12:30 ..
drwxr-xr-x 20 root root 4096 Sep 21 08:59 637d46a8
drwxr-xr-x 20 root root 4096 Sep 21 08:53 68fe97e5
drwxr-xr-x 20 root root 4096 Sep 21 09:04 727e3e34
drwxr-xr-x 20 root root 4096 Sep 21 09:11 ba061f16
/etc/exports:
mark@rgd-strg-01:~$ cat /etc/exports
/srv/nfs/637d46a8/ *(insecure,rw,async,no_subtree_check,no_root_squash)
/srv/nfs/68fe97e5/ *(insecure,rw,async,no_subtree_check,no_root_squash)
/srv/nfs/727e3e34/ *(insecure,rw,async,no_subtree_check,no_root_squash)
/srv/nfs/ba061f16/ *(insecure,rw,async,no_subtree_check,no_root_squash)
/srv/tftpboot/ *(insecure,rw,async,no_subtree_check,no_root_squash)
TFTP:
mark@rgd-strg-01:~$ ls /srv/tftpboot/ -la
total 24
drwxr-xr-x 6 root root 4096 Sep 28 12:11 .
drwxr-xr-x 5 root root 4096 Sep 28 12:30 ..
drwxr-xr-x 3 root root 4096 Sep 28 15:24 637d46a8
drwxr-xr-x 3 root root 4096 Sep 29 09:04 68fe97e5
drwxr-xr-x 3 root root 4096 Sep 28 13:22 727e3e34
drwxr-xr-x 3 root root 4096 Sep 28 13:22 ba061f16
/etc/fstab:
mark@rgd-strg-01:~$ cat /etc/fstab
LABEL=writable / ext4 discard,errors=remount-ro 0 1
LABEL=system-boot /boot/firmware vfat defaults 0 1
/srv/tftpboot/637d46a8 /srv/nfs/637d46a8/boot/firmware none defaults,bind 0 0
/srv/tftpboot/68fe97e5 /srv/nfs/68fe97e5/boot/firmware none defaults,bind 0 0
/srv/tftpboot/727e3e34 /srv/nfs/727e3e34/boot/firmware none defaults,bind 0 0
/srv/tftpboot/ba061f16 /srv/nfs/ba061f16/boot/firmware none defaults,bind 0 0
Here is examples of the cmdline.txt and other important files from a node:
cmdline.txt:
mark@rgd-strg-01:~$ cat /srv/tftpboot/68fe97e5/cmdline.txt
net.ifnames=0 dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=/dev/nfs
nfsroot=10.1.0.20:/srv/nfs/68fe97e5,tcp ip=dhcp elevator=deadline rootwait fixrtc rw
cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory
fstab:
mark@rgd-strg-01:~$ cat /srv/nfs/68fe97e5/etc/fstab
10.1.0.20:/srv/nfs/68fe97e5 / nfs defaults,_netdev 0 0
tmpfs /tmp tmpfs defaults 0 0
tmpfs /var/tmp tmpfs defaults 0 0
tmpfs /var/run tmpfs defaults 0 0
My issue is that in this configuration (having played with the boot process that I do not fully understand to enable the network booting), everything seems to be working fine, but I am having some trouble with kernel updates on the 4 network nodes. After running apt-get upgrade I get the following message

Yet after a reboot, I can see that the kernel has not been updated:
mark@rdg-clust-01:~$ uname -r
5.11.0-1007-raspi
How can I either correct this properly, or at least do it manually? Can anyone help me understand the boot process a little better? Here is the contents of the boot and firmware directories for a node
Boot:
mark@rgd-strg-01:~$ ls /srv/nfs/68fe97e5/boot/ -la
total 133404
drwxr-xr-x 4 root root 4096 Sep 30 06:19 .
drwxr-xr-x 20 root root 4096 Sep 21 08:53 ..
-rw------- 1 root root 5099714 Jul 28 12:42 System.map-5.11.0-
1007-raspi
-rw------- 1 root root 5115999 Aug 23 07:05 System.map-5.11.0-
1017-raspi
-rw------- 1 root root 5115843 Sep 21 14:56 System.map-5.11.0-
1019-raspi
-rw-r--r-- 1 root root 233406 Jul 28 12:42 config-5.11.0-1007-
raspi
-rw-r--r-- 1 root root 233767 Aug 23 07:05 config-5.11.0-1017-
raspi
-rw-r--r-- 1 root root 233790 Sep 21 14:56 config-5.11.0-1019-
raspi
lrwxrwxrwx 1 root root 44 Sep 30 06:14 dtb -> dtbs/5.11.0-
1019-raspi/./bcm2711-rpi-4-b.dtb
lrwxrwxrwx 1 root root 44 Sep 20 16:07 dtb-5.11.0-1007-
raspi -> dtbs/5.11.0-1007-raspi/./bcm2711-rpi-4-b.dtb
lrwxrwxrwx 1 root root 44 Sep 21 08:56 dtb-5.11.0-1017-
raspi -> dtbs/5.11.0-1017-raspi/./bcm2711-rpi-4-b.dtb
lrwxrwxrwx 1 root root 44 Sep 30 06:14 dtb-5.11.0-1019-
raspi -> dtbs/5.11.0-1019-raspi/./bcm2711-rpi-4-b.dtb
drwxr-xr-x 7 root root 4096 Sep 30 06:14 dtbs
drwxr-xr-x 3 root root 4096 Sep 29 09:04 firmware
lrwxrwxrwx 1 root root 28 Sep 30 06:10 initrd.img ->
initrd.img-5.11.0-1019-raspi
-rw-r--r-- 1 root root 28854277 Sep 20 16:07 initrd.img-5.11.0-
1007-raspi
-rw-r--r-- 1 root root 31566190 Sep 21 08:56 initrd.img-5.11.0-
1017-raspi
-rw-r--r-- 1 root root 31593753 Sep 30 06:14 initrd.img-5.11.0-
1019-raspi
lrwxrwxrwx 1 root root 28 Sep 30 06:10 initrd.img.old ->
initrd.img-5.11.0-1017-raspi
lrwxrwxrwx 1 root root 25 Sep 30 06:10 vmlinuz -> vmlinuz-
5.11.0-1019-raspi
-rw------- 1 root root 9464117 Jul 28 12:42 vmlinuz-5.11.0-
1007-raspi
-rw------- 1 root root 9525235 Aug 23 07:05 vmlinuz-5.11.0-
1017-raspi
-rw------- 1 root root 9526813 Sep 21 14:56 vmlinuz-5.11.0-
1019-raspi
lrwxrwxrwx 1 root root 25 Sep 30 06:10 vmlinuz.old ->
vmlinuz-5.11.0-1017-raspi
Firmware:
mark@rgd-strg-01:~$ ls /srv/nfs/68fe97e5/boot/firmware/ -la
total 79968
drwxr-xr-x 3 root root 4096 Sep 29 09:04 .
drwxr-xr-x 4 root root 4096 Sep 30 06:19 ..
-rw-r--r-- 1 root root 1024 Sep 16 13:46 README
-rw-r--r-- 1 root root 26914 Sep 16 13:46 bcm2710-rpi-2-b.dtb
-rw-r--r-- 1 root root 29031 Sep 16 13:46 bcm2710-rpi-3-b-plus.dtb
-rw-r--r-- 1 root root 28412 Sep 16 13:46 bcm2710-rpi-3-b.dtb
-rw-r--r-- 1 root root 26910 Sep 16 13:46 bcm2710-rpi-cm3.dtb
-rw-r--r-- 1 root root 49254 Sep 16 13:46 bcm2711-rpi-4-b.dtb
-rw-r--r-- 1 root root 48910 Sep 16 13:46 bcm2711-rpi-400.dtb
-rw-r--r-- 1 root root 49318 Sep 16 13:46 bcm2711-rpi-cm4.dtb
-rw-r--r-- 1 root root 20140 Sep 16 13:46 bcm2837-rpi-3-a-plus.dtb
-rw-r--r-- 1 root root 21009 Sep 16 13:46 bcm2837-rpi-3-b-plus.dtb
-rw-r--r-- 1 root root 20545 Sep 16 13:46 bcm2837-rpi-3-b.dtb
-rw-r--r-- 1 root root 19872 Sep 16 13:46 bcm2837-rpi-cm3-io3.dtb
-rw-r--r-- 1 root root 4638 Sep 16 13:46 boot.scr
-rw-r--r-- 1 root root 52456 Sep 16 13:46 bootcode.bin
-rw-r--r-- 1 root root 228 Sep 28 13:22 cmdline.txt
-rw-r--r-- 1 root root 1142 Sep 16 13:46 config.txt
-rw-r--r-- 1 root root 7314 Sep 16 13:46 fixup.dat
-rw-r--r-- 1 root root 5448 Sep 16 13:46 fixup4.dat
-rw-r--r-- 1 root root 3187 Sep 16 13:46 fixup4cd.dat
-rw-r--r-- 1 root root 8452 Sep 16 13:46 fixup4db.dat
-rw-r--r-- 1 root root 8454 Sep 16 13:46 fixup4x.dat
-rw-r--r-- 1 root root 3187 Sep 16 13:46 fixup_cd.dat
-rw-r--r-- 1 root root 10298 Sep 16 13:46 fixup_db.dat
-rw-r--r-- 1 root root 10298 Sep 16 13:46 fixup_x.dat
-rw-r--r-- 1 root root 22666147 Sep 16 13:46 initrd.img
-rw-r--r-- 1 root root 1559 Sep 16 13:46 overlay_map.dtb
drwxr-xr-x 2 root root 12288 Sep 16 13:46 overlays
-rw-r--r-- 1 root root 2952960 Sep 16 13:46 start.elf
-rw-r--r-- 1 root root 2228800 Sep 16 13:46 start4.elf
-rw-r--r-- 1 root root 793116 Sep 16 13:46 start4cd.elf
-rw-r--r-- 1 root root 3722504 Sep 16 13:46 start4db.elf
-rw-r--r-- 1 root root 2981192 Sep 16 13:46 start4x.elf
-rw-r--r-- 1 root root 793116 Sep 16 13:46 start_cd.elf
-rw-r--r-- 1 root root 4794472 Sep 16 13:46 start_db.elf
-rw-r--r-- 1 root root 3704808 Sep 16 13:46 start_x.elf
-rw-r--r-- 1 root root 515920 Sep 16 13:46 uboot_rpi_3.bin
-rw-r--r-- 1 root root 571832 Sep 16 13:46 uboot_rpi_4.bin
-rw-r--r-- 1 root root 558216 Sep 16 13:46 uboot_rpi_arm64.bin
-rw-r--r-- 1 root root 25580032 Sep 16 13:46 vmlinux
-rw-r--r-- 1 root root 9464117 Sep 16 13:46 vmlinuz
From this I can see that vmlinux and vmlinuz in the firmware directory are old, and I presume that these represent the kernel loaded at boot time. So I removed them and created a symbolic link on the node pointing /boot/firmware/vmlinuz to /boot/vmlinuz then I updated /boot/firmware/config.txt and set kernel=vmlinuz. after a reboot it still loads the old kernel. What do I need to do?