Gitlab-runner using Virtualbox VM stuck in 'poweroff' state

ph flag

The Problem

After a specific project runs on my CI, virtualbox hangs while trying to 'poweroff' the VM. vboxmanage list runningvms shows nothing, but ps ax | grep VBoxHeadless shows the process is still running, and vboxmanage controlvm <VMName> poweroff throws error: The virtual machine is being powered down. It will sit like this indefinitely until I manually kill the process.

The Details

I use the virtualbox executor on an ubuntu 20.04 host to run a windows 10 guest instance for my CI. Most of the time it works marvelously, but one of my python projects gets stuck at the very end with the log showing:

Cleaning up project directory and file based variables

I enabled the debug log on gitlab-runner and it shows:

Executing VBoxManageOutput: []string{"controlvm", "GLR-runner-XXXXXXXX-concurrent-0", "poweroff"}

VBox.log shows:

************** End of Guest state at power off ***************

I can see, however, that the VBoxHeadless process is still runnning:

$ ps ax | grep VBoxHeadless

 324182 ?        SLl    5:00 /usr/lib/virtualbox/VBoxHeadless --comment GLR-runner-R2WzVtfH-concurrent-0 --startvm 1a585225-00c3-4099-903c-a82f67f0a404 --vrde config

None of the logs will show anything else until I manually kill the process, at which point the gitlab-runner will continue as expected.

Things I Have Tried

  1. I upgraded from VirtualBox 6.1.22 to 6.1.32. No change.
  2. I removed the test step (tox) in my CI file and the VM shuts down correctly, but as the purpose of the CI is to test my code, this isn't a viable solution.
    • This made me suspect that the tests were spawning processes that could not be killed, but further investigation showed that the 'poweroff' command does not do a soft-shutdown and is more akin to using the power button on a physical machine.
  3. Per this ticket I tried disabling 3D acceleration. No luck.
  4. Per this ticket I enabled nested paging to no effect.
  5. I uninstalled all extension packs. Nothing changed.
  6. I exported, deleted, and re-imported the base VM. Same issue.
  7. I re-created the VM from scratch. Same problem.

Is there something I am missing?


