Score:1

nvidia-smi stopped working after installing cuda-toolkit

sa flag

TLDR

I'm trying to get nvidia-smi back up, which was working fine until I installed cuda-toolkit. Uninstalling cuda-toolkit didn't help. How can I restore nvidia-smi output?

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.


More details

I've GEFORCE RTX 2070 on my laptop running Ubuntu 18.04 and had successfully installed its driver from the official runfile NVIDIA-Linux-x86_64-470.63.01.run. Here is the output of nvidia-smi from that installation:

enter image description here

Next, I installed cuda-toolkit from the official runfile cuda_11.4.2_470.57.02_linux.run, making sure to un-select driver installation. Here's the terminal window right after installation completed:

enter image description here

Right after, when I did nvidia-smi, I get:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Since it was cuda-toolkit's installation that presumably "broke" nvidia-smi, I uninstalled cuda-toolkit (by running cuda-uninstaller found in /usr/local/cuda-11.4/bin, as stated in the generated text after installation).

Unfortunately, that doesn't help and nvidia-smi is still corrupted. The reason I'm installing from official NVIDIA runfiles is because previously I had issues installing driver from Ubuntu repositories, but could make it work with the official driver. So I figured I'd try the same with cuda-toolkit.

How can I get back nvidia-smi?

Outputs of some commands, if relevant

  • which nvidia-smi : /usr/bin/nvidia-smi
  • mokutil --sb-state : SecureBoot disabled
  • nvidia-settings :
    • ERROR: NVIDIA driver is not loaded
    • ERROR: Unable to load info from any available system
  • ls /sys/firmware/efi/ :
    • config_table efivars esrt fw_platform_size fw_vendor runtime runtime-map systab vars
  • lspci -k | grep -EA2 'VGA|3D' :

00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD Graphics] (rev 05)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ae
Kernel driver in use: i915

01:00.0 VGA compatible controller: NVIDIA Corporation TU106M [GeForce RTX 2070 Mobile / > Max-Q Refresh] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ae
Kernel modules: nvidiafb, nouveau

  • cat /etc/modprobe.d/blacklist-nouveau.conf :

blacklist nouveau
blacklist vga16b
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
blacklist amd76_edac
alias nouveau off
alias lbm-nouveau off
options nouveau modeset=0

  • cat /proc/version :

    • Linux version 5.4.0-84-generic (buildd@lcy01-amd64-007) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #94~18.04.1-Ubuntu SMP Thu Aug 26 23:17:46 UTC 2021
  • sudo lshw -c video : (NVIDIA display is "unclaimed", but this is how it should be)

enter image description here

  • dkms status : no output
  • lsmod | grep nvidia :
    • i2c_nvidia_gpu 16384 0
  • echo $XDG_SESSION_TYPE : x11
  • whereis nvidia :
    • nvidia: /usr/lib/x86_64-linux-gnu/nvidia /usr/lib/nvidia /usr/share/nvidia /usr/src/nvidia-470.63.01/nvidia
  • grep nvidia /etc/modprobe.d/* /lib/modprobe.d/*:

/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb
/etc/modprobe.d/blacklist-nouveau.conf:blacklist nvidiafb
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf:# generated by nvidia-installer
/lib/modprobe.d/nvidia-runtimepm.conf:options nvidia "NVreg_DynamicPowerManagement=0x02"

Posts / Questions I've already looked at:

cc flag
Your system /usr/bin/gcc --version should be 9.3.0, and if you altered your PATH, maybe gcc --version might be some other, but not 7.5. When altering the gcc version for CUDA, do not alter the system default (never use /etc/alternatives for gcc!!!!). Manipulate CUDA's gcc via the cuda/bin having links (or executables) to the required version. The Nvidia driver number in the standard repos is 470.63.01, so I'd use that after cleaning out all the existing Nvidia packges.
MorganStark47 avatar
sa flag
Alright so `gcc --version` was indeed 7.5. To upgrade to 9 (which was already installed) I used `sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 9` and `sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 9` so now `/usr/bin/gcc --version` and `/usr/bin/g++ --version` return 9.4.0. The error with nvidia-smi remains though. Does the driver require a re-install? Driver-installation isn't a quick and easy process - at least for me - so that's something I'd do only if I have to
cc flag
My confusion, your kernel made me think 20.04 with default gcc 9.3. I added the HWE and 18.04 tags. I don't use any HWE, so not sure how that works --, demanding one default compiler version for the kernel/modules(9.3), and another for the rest of the system (7.5?). Not sure where your gcc 9.4 ver came from unless it's for a HWE for 21.04 (but then why didn't the kernel update?). I suppose it would be possible to use update-alternatives 9.3 for gcc, --reconfigure the nvidia-driver-470 package to recompile, then switch back to the 7.5 gcc for the rest of the system.
MorganStark47 avatar
sa flag
Thanks for adding the tags. "reconfigure the nvidia-driver-470 to recompile" -- hm since `nvidia-settings` doesn't work (output included in question) I'm not sure how I'd do that.
Score:1
sa flag

I purged all nvidia stuff and then tried sudo ubuntu-drivers autoinstall followed by sudo reboot after which nvidia-smi works fine.

enter image description here

So I guess the solution was to re-install NVIDIA drivers.

antou avatar
gb flag
Thanks! Worked for me too on Ubuntu 22.04.2. `sudo apt remove --purge nvidia* && sudo ubuntu-drivers autoinstall && sudo reboot`
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.