Score:1

NVIDIA drivers suddenly stopped working after rebooting from Windows, reinstall fails (Ubuntu 20.04)

it flag

System Information

  • MSI Creator 15 Laptop
  • NVIDIA GeForce RTX 2070 SUPER Mobile / Max-Q
  • External LG Ultrawide monitor
  • Windows 10 / Ubuntu 20.04 dual boot

The Problem

I have been using the nvidia 455 drivers on my Ubuntu 20.04 machine successfully now for about six months. I rarely use the Windows partition, but I was using it yesterday. After shutting down Windows 10 and returning to Ubuntu, my external display stopped working entirely.

(Note: it's possible Windows has nothing to do with the issue -- restarting may have given Ubuntu the chance to update packages and break itself)

Apparently, the NVIDIA drivers no longer work. Running nvidia-smi and other commands produced the following error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.  Make sure that the latest NVIDIA driver is installed and running.

Googling for answers, most of the solutions recommended reinstalling the NVIDIA drivers when this happens. Note that I need the graphics drivers as well as the CUDA toolkit along with nvcc etc..

Purge Nvidia

I have tried many different solutions, and I run these commands whenever I get stuck and need to start fresh.

sudo apt purge nvidia*
sudo apt purge libnvidia*
sudo apt autoremove

Normally I'm running these in recovery mode after freshly-installed drivers cause Ubuntu to get stuck in the startup process after rebooting.

I also check dpkg -l | grep nvidia and remove any of the packages left over by the installation process. This was necessary when I wanted to install older versions of the drivers.

Attempted Solutions

Here's a list of everything I've tried:

  • restarting my machine countless times (including full power off and unplugging for a while)

  • Following the official NVIDIA Cuda Installation Guide to reinstall drivers and manage conflicts. For example,

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.4.1/local_installers/cuda-repo-ubuntu2004-11-4-local_11.4.1-470.57.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-4-local_11.4.1-470.57.02-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2004-11-4-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
  • Tried to blacklist noveau and nvidiafb:
blacklist nvidiafb
blacklist nouveau
options nouveau modeset=0
  • When reinstalling nvidia drivers, I tried multiple driver versions (470, 465, 460, 455) using multiple installation methods (first deb, then ubuntu-distributed, then runfile). All of them failed in different ways. Most commonly, when I reboot after installing the drivers, Ubuntu hangs infinitely on startup (I see a black screen with an MSI logo and an "ubuntu" logo, sometimes with a spinning circle).

  • The NVIDIA drivers seem to still be working fine in Windows, so I don't think my graphics card is fried or anything like that.

  • booting into Ubuntu recovery mode from grub and select the dpkg repaiingr option -- didn't seem to help anything

  • sudo ubundu-drivers autoinstall -- this installed the nvidia 470 drivers, unsuccessfully

  • I noticed that uname -r indicated my kernel version was 5.11, when the support table for the Nvidia drivers shows that only 5.4 is supported for Ubuntu 20.04. So, I downgraded o 5.4 and re-installed the nvidia drivers, again with no success.

Observations

nvidia-smi does produce output (instead of an error) in the following situations:

  • after reinstalling drivers but before restarting the system
  • in recovery mode after reinstalling drivers

What now?

I am at a complete loss for what to do. The only thing I can think of is to completely re-install Ubuntu, which seems crazy when everything was working just fine yesterday.

References

AskUbuntu.SE, "Blank screen after installing nvidia restricted driver"

AskUbuntu.SE, Ubuntu 18.04 and nVidia. Stuck after boot

AskUbuntu.SE, Boot hangs after installing the latest driver from PPA and Ctrl+Alt+F1 keyboard shortcut doesn't work

AskUbuntu.SE, Stuck at boot screen, Nvidia graphics driver issues

AskUbuntu.SE Changing NVIDIA Drivers makes Ubuntu freeze on startup

AskUbuntu.SE Blank screen after installing nvidia restricted driver

AskUbuntu.SE graphics driver stopped working

AskUbuntu.SE Ubuntu 20.04 Nvidia graphics unusable (recommends switching to kernel 5.4)

System Info

Before writing this question, I again purged everything from my system using the method described above. In this state, here is some system information:

Kernel Version

$ uname -r
5.4.0-80-generic

Secure Boot

$ sudo mokutil --sb-state
SecureBoot disabled

lshw

$ sudo lshw -C display
  *-display UNCLAIMED       
       description: VGA compatible controller
       product: TU104M [GeForce RTX 2070 SUPER Mobile / Max-Q]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:01:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller cap_list
       configuration: latency=0
       resources: memory:ac000000-acffffff memory:80000000-8fffffff memory:90000000-91ffffff ioport:3000(size=128) memory:ad000000-ad07ffff
  *-display
       description: VGA compatible controller
       product: UHD Graphics
       vendor: Intel Corporation
       physical id: 2
       bus info: pci@0000:00:02.0
       version: 05
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
       configuration: driver=i915 latency=0
       resources: irq:191 memory:ab000000-abffffff memory:40000000-4fffffff ioport:4000(size=64) memory:c0000-dffff

hwinfo

$ hwinfo --gfxcard
16: PCI 100.0: 0300 VGA compatible controller (VGA)             
  [Created at pci.386]
  Unique ID: VCu0.pBgP2fGEzV8
  Parent ID: vSkL.sXdMPV6yXb4
  SysFS ID: /devices/pci0000:00/0000:00:01.0/0000:01:00.0
  SysFS BusID: 0000:01:00.0
  Hardware Class: graphics card
  Model: "nVidia VGA compatible controller"
  Vendor: pci 0x10de "nVidia Corporation"
  Device: pci 0x1e91 
  SubVendor: pci 0x1462 "Micro-Star International Co., Ltd. [MSI]"
  SubDevice: pci 0x12c6 
  Revision: 0xa1
  Memory Range: 0xac000000-0xacffffff (rw,non-prefetchable,disabled)
  Memory Range: 0x80000000-0x8fffffff (ro,non-prefetchable,disabled)
  Memory Range: 0x90000000-0x91ffffff (ro,non-prefetchable,disabled)
  I/O Ports: 0x3000-0x307f (rw,disabled)
  Memory Range: 0xad000000-0xad07ffff (ro,non-prefetchable,disabled)
  IRQ: 255 (no events)
  Module Alias: "pci:v000010DEd00001E91sv00001462sd000012C6bc03sc00i00"
  Driver Info #0:
    Driver Status: nvidiafb is not active
    Driver Activation Cmd: "modprobe nvidiafb"
  Driver Info #1:
    Driver Status: nouveau is not active
    Driver Activation Cmd: "modprobe nouveau"
  Driver Info #2:
    Driver Status: nvidia_drm is not active
    Driver Activation Cmd: "modprobe nvidia_drm"
  Driver Info #3:
    Driver Status: nvidia is not active
    Driver Activation Cmd: "modprobe nvidia"
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #11 (PCI bridge)
 
34: PCI 02.0: 0300 VGA compatible controller (VGA)
  [Created at pci.386]
  Unique ID: _Znp.7YEiQ6GHkFE
  SysFS ID: /devices/pci0000:00/0000:00:02.0
  SysFS BusID: 0000:00:02.0
  Hardware Class: graphics card
  Device Name: "Onboard - Video"
  Model: "Intel VGA compatible controller"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x9bc4 
  SubVendor: pci 0x1462 "Micro-Star International Co., Ltd. [MSI]"
  SubDevice: pci 0x12c6 
  Revision: 0x05
  Driver: "i915"
  Driver Modules: "i915"
  Memory Range: 0xab000000-0xabffffff (rw,non-prefetchable)
  Memory Range: 0x40000000-0x4fffffff (ro,non-prefetchable)
  I/O Ports: 0x4000-0x403f (rw)
  Memory Range: 0x000c0000-0x000dffff (rw,non-prefetchable,disabled)
  IRQ: 192 (55080 events)
  Module Alias: "pci:v00008086d00009BC4sv00001462sd000012C6bc03sc00i00"
  Driver Info #0:
    Driver Status: i915 is active
    Driver Activation Cmd: "modprobe i915"
  Config Status: cfg=new, avail=yes, need=no, active=unknown
 
Primary display adapter: #16

ubuntu-drivers

$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001E91sv00001462sd000012C6bc03sc00i00
vendor   : NVIDIA Corporation
driver   : nvidia-driver-450-server - distro non-free
driver   : nvidia-driver-460 - distro non-free recommended
driver   : nvidia-driver-470-server - distro non-free
driver   : nvidia-driver-470 - third-party non-free
driver   : nvidia-driver-460-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

Thank You!

Please let me know if any further information is needed and I'll do my best to provide it! Thanks for any help you can provide!

cc flag
See the nice writeup https://askubuntu.com/questions/1077061/how-do-i-install-nvidia-and-cuda-drivers-into-ubuntu/1077063#1077063 for using the runfile to install CUDA. Basically, install the (470 for your card) Nvidia driver from the standard repos, then (optionally) override the runfile default (system) locations to your local cuda setup. Treat CUDA like an app, it doesn't dictate the system video driver or compiler. You can install all the CUDA files locally, then add overrides as needed for gcc, etc. to that CUDA/bin, which gets put early in the PATH.
oldfred avatar
cn flag
Since trying different drivers, have you totally purged before attempting install of new driver? If not purged you get conflicts and then nothing works. nVidia install, purge if needed. https://ubuntuforums.org/showthread.php?t=2383560&p=13735336#post13735336 Purge then install the recommended driver.
Benjamin Bray avatar
it flag
@oldfred, yes I purge between each attempted reinstall using the steps listed in my question. Is there any diagnostic tool for discovering improperly installed/uninstalled graphics drivers?
Benjamin Bray avatar
it flag
@ubfan1 Thanks -- but every other source that I've seen has said that installing from a runfile is a big no-no unless you really know what you're doing (which I really don't!). I worry that it might leave my system in a state that's even more difficult to diagnose / upgrade later.
Score:2
it flag

I ran the following today (after purging as described above) and it seems to be working again after a reboot:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install nvidia-driver-460

Don't ask me why it works -- I tried literally the same thing yesterday with no success.

Paul avatar
us flag
I have limited experience and migrated to Mint for my desktop OS, but these Nvidia driver issues are persistent across Ubuntu and as best I can tell other variants. It is likely a sub-optimal suggestion for you, but consider at least not running dual boot and consider migrating to an AMD GPU (or Intel, if that can do everything you require), and I _know_ how hard that is for laptops.
Score:0
us flag

I solved the problem by reinstalling the driver and enabling all GPUs through the Nvidia driver:

  1. Run the command:

    sudo nvidia-config --enable-all-gpus.
    
  2. Shut down and power up (not reboot).

Atul Vinayak avatar
vg flag
`nvidia-config` does not exist. Where did you install this from?
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.