Score:5

NVIDIA driver stops working on reboot Ubuntu 23.04 lunar

ar flag

The issue at hand

I have laptop with AMD CPU and Nvidia GPU, and use Ubuntu. This configuration has given me a lot of trouble, because AMD support for Linux is apparently not working very well. But a new issue has popped up recently, and these are the steps:

  1. PC is working fine, Nvidia driver is installed and running, and I can use the GPU for development.
  2. I reboot the PC
  3. The Nvidia now no longer works

This is where I am now. When I run nvidia-smi I get this message:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

But when I open the panel "Additional Drivers" inside "Software & Updates", then I get shown this as active Driver:

o Using NVIDIA driver metapackage from nvidia-driver-535 (proprietary, tested)

I can also run lspci | grep VGA for this output:

01:00.0 VGA compatible controller: NVIDIA Corporation GA104 [Geforce RTX 3070 Ti Laptop GPU] (rev a1)
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [Radeon 680M] (rev c7)

Finally, I can run sudo apt search nvidia-driver-535 for this output:

...
nvidia-driver-535/lunar-updates,lunar-security,lunar,now 535.54.03-0ubuntu0.23.04.2 amd64 [installed]
  NVIDIA driver metapackage
...
xserver-xorg-video-nvidia-535/lunar-updates,lunar-security,lunar,now 535.54.03-0ubuntu0.23.04.2 amd64 [installed,automatic]
  NVIDIA binary Xorg driver
...

And yes, I just deleted (ie. sudo apt purge nvidia*) the driver from the PC and reinstalled it. Same problem still. And I have also forced gdm3 to use X11 by editing the file /etc/gdm3/custom.conf. Because it doesn't work with Wayland.

Attempt at debugging

My skills at debugging the internals of Linux are quite limited, but I did get a few messages from running sudo journalctl -S -1h:

modprobe: FATAL: Module nvidia not found in directory /lib/modules/6.2.0-25-generic
[...]
systemd[1860]: Started app-gnome-nvidia\x2dsettings-7946.scope - Application launched by gnome-shell.
nvidia-settings.desktop[7946]: ERROR: NVIDIA driver is not loaded
nvidia-settings[7946]: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
nvidia-settings[7946]: ctk_powermode_new: assertion '(ctrl_target != NULL) && (ctrl_target->h != NULL)' failed
nvidia-settings.desktop[7946]: ERROR: nvidia-settings could not find the registry key file or the X server is not accessible. This file should have been installed along with this driver at /usr/share/nvidia/nvidia-application-profiles-key-documentation. The application profiles will continue to work, but values cannot be prepopulated or validated, and will not be listed in the help text. Please see the README for possible values and descriptions.
nvidia-settings[7946]: PRIME: No offloading required. Abort
nvidia-settings[7946]: PRIME: is it supported? no

Conclusion

The PC can read the GPU. The NVIDIA driver is correctly installed. But somehow, randomly, on reboot the driver gets disabled. Or something.

What is happening? I have absolutely no idea how to fix this.

wa4557 avatar
us flag
For a reason I don't understand. the newest linux-modules-nvidia-535-generic package is still 6.2.0-24.24+5, while the newest kernel is 6.2.0-25. At least for me this seems to be the problem and I have to use kernel 6.2.0-24 for now until this package is updated
alexpanter avatar
ar flag
@wa4557 That's a good point! My kernel is at the time of writing `6.2.0-25-generic`, so perhaps that is the issue? I don't really remember if I have updated that recently. Perhaps the signing process is an issue, and that's why I can run with disabled secure boot..
wa4557 avatar
us flag
If you boot into 6.2.0.24 (in grub menu) you will see that it probably works. Secure boot is exactly what will not work without the proper Linux modules Nvidia package
FedKad avatar
cn flag
Yes, it seems that kernel 6.2.0.25-generic has some nvidia related problems.
SkiBum avatar
cm flag
I rebooted into Linux kernel 6.2.0-24 and then used ubuntu-drivers install nvidia:535. After which i confirmed that nvidia-dkms-535 was installed. I then rebooted into 6.2.0-24 and the display still hangs after the last boot message. I can confirm the system is up and running because services that I can access without a terminal are working. I have to use the pc reset key to reboot the system. Has anyone logged a this issue to the lunar lobster bug site?
gerald46 avatar
cn flag
Today Ubuntu downloaded another kernel 6.2.0.26 for installation. No issues with Nvidia driver 535. I removed 6.2.25. The two remaining are .24 and .26.
Score:2
ar flag

Temporary solution suggestion

After a quick reboot it seems to be working now - but I will leave the question open in case someone else has the same problem, or it somehow appears again. The "fix" was to open the BIOS upon startup, and disable Secure Boot. No idea why - saw it suggested somewhere - and now nvidia-smi runs as it should.

zunnzunn avatar
zm flag
Can confirm that this worked for me as well.
SkiBum avatar
cm flag
I too can confirm this worked for me.
Score:1
sj flag

Same problem happened to me 2 days ago after updating the kernel. I fixed it by disabling the secure boot.

This is unusual though because before the update I was able to run the drivers properly even with the secure boot on.

Score:0
tr flag
sudo apt-get install nvidia-dkms-535 

This worked for me

Score:0
th flag

For some strange reason, I had to:

sudo apt-get install nvidia-dkms-535 

I thought this was installed previously as a dependency for nvidia-driver-535...because it did work before I did an apt-get upgrade and rebooted.

Anyway, the driver now shows up in my /lib/modules/ folder for all installed kernel versions.

alexpanter avatar
ar flag
Hi, did you try to check the difference between kernel version and whatever the nvidia module is signed for, as @wa4557 suggested in a comment to the question? I'm curious to hear what influence this might have on the secure boot..
Gabriel Samfira avatar
th flag
I have secure boot off. The module was just completely missing. Installing nvidia-dkms-535 manually compiled the module and after a reboot, everything worked again.
Score:0
cn flag

Mine updated the kernel to 6.2.0.25 from 24. It booted but no desktop. Not the first time with 23.04. I launched tty and shutdown. Upon reboot launched the grub menu and went back to 6.2.0.24. Then modified grub to only boot off of my selection and not an update. My test machine runs with secure boot off. I've got 10 ssd's that I can switch to test different distributions. The motherboard is an Asus Z77. Debian, Ubuntu 23.04, 22.04, Linux Mint, Manjaro, will not run with secure boot on.

SkiBum avatar
cm flag
how do you launch tty from a system where the display is not working???? I would much prefer your approach than having to do a h/w reset on the PC
gerald46 avatar
cn flag
Hi, you can reach the tty terminal when the desktop is not working by CTRL + ALT + F3 – TTY3. Then log in there with your password. Then type shutdown. 1 min later it will shutdown.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.