Score:0

Ubuntu VM in Hyper-V Gen2 will not boot consistently

bq flag

I have a Hyper-V virtual environment (I know, I know) in Windows Server 2019. This environment handles mostly Windows guests, but has a handful of linux machines, including two gen2 guests running Ubuntu 18.04 LTS.

My problem is these two guests often fail to reboot properly. When they start I can see the grub menu to select a kernel, and no matter what option I pick I'll see this (with the appropriate kernel version):

Loading Linux 4.15.0-167-generic ...
Loading initial ramdisk ...

Immediately after showing this message the VM will restart itself. I'll see this same message a few times in a loop before it gives up and just powers down completely.

I can find the echo commands in the boot script that show these messages and added an additional Ramdisk loaded ... message after the initrd command, to know it completes, and I do also see this message.

Here's the kicker: if I keep trying, eventually the machine will succeed and boot properly. Sometimes it can take dozens into a couple hundred retries, but so far they do always eventually boot. This has been going a for some time now, and each time I try to research what's going on, but I haven't been able to find any errors, and the machine will boot before I get far enough to find anything helpful.

One confounding factor in all this is I'm not typically looking to reboot the machine in the first place unless I've also done an apt ugrade that's likely to include an updated kernel.

What could be going on here? What could be in a race condition here in such a way that the boot process will still eventually finish?

bq flag
Note: I removed the "18.04" tag because this issue has now also followed us through 20.04 and 22.04 upgrades, as well as multiple versions of windows server (2012 R2, 2016, and currently 2019).
cv flag
I don't know the reason, but this is known problem. See [here](https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1918265). As a workaround use rEFInd, or boot directly the kernel as suggested in the comments. I had the same issue and using rEFInd solved it
bq flag
@Mateusz Thank you! I have one currently in a loop. I tried booting with the network adapter set as "not connected", and it started up on the first try after that change. As soon as it starts to load I set the adapter back and it's ready and available by the time the ip config loads. With only two such VMs in our environment, that's good enough that we can function normally again.
Score:0
bq flag

The solution so far has been to remove the network adapter before starting the VM. Then, as soon as we're past the failure point in the boot process and the kernel begins to load we re-enable it. If done properly, the adapter is available fast enough that everything still starts up correctly when loading all the adapter configurations. When following this process the VM will properly the first time, every time.

Credit goes to Mateusz for directing me to this link (which is also why this answer is Community Wiki — it's not my own work):

https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1918265

It may be possible to script this process via Powershell, though one would likely have to guess at the delay. I haven't gotten that far.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.