Score:3

Nvidia Driver Broken on Update - Unable to Reinstall

in flag

I run Ubuntu 20.04 and after the last reboot I had trouble with my graphics driver - the system is in low res, only one monitor is working.

Debug Output

$ sudo lshw -C display
  *-display UNCLAIMED       
       description: VGA compatible controller
       product: TU104 [GeForce RTX 2070 SUPER]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:31:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller cap_list
       configuration: latency=0
       resources: memory:f5000000-f5ffffff memory:e0000000-efffffff memory:f0000000-f1ffffff ioport:f000(size=128) memory:f6000000-f607ffff
$ sudo dkms status
nvidia, 510.47.03: added

That status seems a bit exotic, at least I did not find many similar cases while googling.

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
$ modinfo nvidia
modinfo: ERROR: Module nvidia not found.

In the system info I see "llvmpipe (LLVM 12.0.0, 256 bits)" as my graphics.

What I Tried

I have tried multiple ways of installing Nvidia drivers, I used apt sudo apt autoremove --purge nvidia* && sudo apt install nvidia-driver-510, "Additional Drivers" UI and ubuntu-drivers, I tried the currently latest version 510 and the older one that worked before - 470. I also tried selecting nvidia sudo prime-select nvidia as well as selecting intel and swithcing back to nvidia - same result.

Background

I used Nvidia driver 470 and Kernel 5.13.0.26, then after reboot I got Kernel .27 and no wifi, I had that problem recently due to Nvidia driver (needed to install linux-modules-extra for the new Kernel) so I decided to upgrade drivers hoping everything will be resolved. That lead to the current situation: installing linux-modules-extra-5.13.0-27-generic and then after switch to 510 - same for .28 fixed wifi issue, but the video driver is broken. While using 5.13.0.27 I was able to boot 5.13.0.26 and there I had working video, now it's not the case cause .27 is the oldest of recent Kernels in the Grub menu.

I feel like I am missing some step that would fix that, would appreciate any help.

UPD

sudo dkms install -m nvidia -v 510.47.03 -k 5.13.0-28-generic --force
Error! Your kernel headers for kernel 5.13.0-28-generic cannot be found.
Please install the linux-headers-5.13.0-28-generic package,
or use the --kernelsourcedir option to tell DKMS where it's located
$ sudo dkms build -m nvidia -v 510.47.03
Error! Your kernel headers for kernel 5.13.0-28-generic cannot be found.
Please install the linux-headers-5.13.0-28-generic package,
or use the --kernelsourcedir option to tell DKMS where it's located

So it seems dkms is somehow unaware of my kernel. I used above error message's recommendation and installed the headers sudo apt install linux-headers-5.13.0-28-generic, after that the output looks better:

sudo dkms build -m nvidia -v 510.47.03
Module nvidia/510.47.03 already built for kernel 5.13.0-28-generic/4
sudo dkms status
nvidia, 510.47.03, 5.13.0-28-generic, x86_64: installed

I'll try rebooting now and then install the driver as per recommendation in comments.

UPD2

That's it, everything seems to work now. There is no need to do anything about the drivers, it seems the problem was with missing headers.

Terrance avatar
id flag
`sudo dkms install -m nvidia -v 510.47.03 -k 5.13.0-28-generic --force` should be able to install the driver into that kernel.
heynnema avatar
ru flag
@Terrance The dkms build/install probably failed during the install of Nvidia 510 because secure boot is enabled... or a kernel lib/extras is missing.
Terrance avatar
id flag
@heynnema I guess that is possible that Secure Boot is enabled. Usually once you see the dkms driver showing `added` is that the 2 of the 3 steps are done where it is `dkms add` and `dkms build` performed. Just the `dkms install` step wasn't ran or failed. Or maybe the `dkms build` might have failed too.
Terrance avatar
id flag
What output do you get when you run `sudo dkms build -m nvidia -v 510.47.03`?
heynnema avatar
ru flag
@Terrance Yes, I suspect that the dkms build failed, either because Secure Boot was enabled, or if some libs are missing. dkms status didn't show prior builds against older kernels, which probably meant that OP never had Nivdia installed before. We'll see if your dkms build command works, or errors out. Then a dkms install would be next.
heik avatar
in flag
@Terrance @heynnema sorry, I forgot to mention I checked secure boot, it is disabled. But it is possible that some libs are missing - some time ago I used aptitude but then I remembered it can mess up dependencies, so I suspect that was the root cause. ``` $ sudo dkms build -m nvidia -v 510.47.03 Error! Your kernel headers for kernel 5.13.0-28-generic cannot be found. Please install the linux-headers-5.13.0-28-generic package, or use the --kernelsourcedir option to tell DKMS where it's located ```
heik avatar
in flag
The comments already helped a lot, at least it is clear that something was missing and what it was, I have updated the post.
heik avatar
in flag
@Terrance please post your comments and recommendation to install linux-headers-5.13.0-28-generic (see updated post) so I will accept it as the answer. Your comments lead me to the solution. Many thanks!
Terrance avatar
id flag
@heik If you want to you can go ahead and write up the answer and I will upvote it. I have no problem stopping in and helping where I can, and I am glad that you were able to solve the issue. ;)
Someone avatar
my flag
Hi! I'm sorry for being unclear... While writing my answer I wrongly assumed that you have all the prerequisites installed. Of course, Linux Headers are needed! I've updated my answer and improved clarity. Also, you've installed the kernel headers of your current kernel... You'll have to repeat this process each time your kernel gets an upgrade. Consider installing the package `sudo apt install linux-headers-generic` so that you won't have to repeat this process. As, I have clarified my answer, you may accept it or post a new answer...
heik avatar
in flag
@Terrance I wish the world had more people like you :)
Marcin Zalewski avatar
ma flag
Thank you for this question and the answer. It turns out the problem on my system was exactly the same. Installing the missing headers fixed everything. This was so frustrating...
Score:1
my flag

Assuming that you have all the prerequisites installed (sudo apt install linux-headers-generic), you can follow these steps to fix the issue:

  1. (Optional) Boot into a root shell to safely run the commands.

  2. Remove your dkms file for NVIDIA drivers:

    sudo rm -r /var/lib/dkms/nvidia
    
  3. Purge NVIDIA drivers:

    sudo dpkg -P --force-all $(dpkg -l | grep "nvidia-" | grep -v lib | awk '{print $2}')
    
  4. Reinstall NVIDIA drivers:

    sudo ubuntu-drivers autoinstall
    
  5. Reboot!

Now, your NVIDIA drivers should work properly!

heynnema avatar
ru flag
I'd do this slightly differently. Wrong way to remove dkms/nvidia with rm. No need to reinstall dkms. I'd first check that secure boot is disabled. The nvidia dkms is added, but not built or installed. dkms build and dkms install. Reboot. Check dkms status and nvidia-smi.
heik avatar
in flag
I tried that, but in the end I get the same result.
Someone avatar
my flag
@heik Hi! I'm sorry for being unclear... While writing my answer I wrongly assumed that you have all the prerequisites installed. Of course, Linux Headers are needed! I've updated my answer and improved clarity. Also, you've installed the kernel headers of your current kernel... You'll have to repeat this process each time your kernel gets an upgrade. Consider installing the package `sudo apt install linux-headers-generic` so that you won't have to repeat this process. As, I have clarified my answer, you may accept it or post a new answer...
heik avatar
in flag
@Someone thank you for your feedback! I honestly also assumed I have all the prerequisites installed, I guess an attempt to use apritide in the past messed up my dependencies more then I thought. I do have the latest version of linux-headers-generic despite installing the specific one as mentioned in my update: "linux-headers-generic is already the newest version (5.4.0.99.103)." I've accepted your answer cause together with my updates it should cover everything a googling person might need in a similar situation.
Score:0
kw flag

I have the same graphics card and Ubuntu version. I was getting similar errors, but my solution was to do:

sudo init 3
# Then after logging in again:
sudo apt install nvidia-driver-525 nvidia-dkms-525
sudo reboot
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.