TLDR
I'm trying to get nvidia-smi
back up, which was working fine until I installed cuda-toolkit. Uninstalling cuda-toolkit didn't help. How can I restore nvidia-smi
output?
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA
driver. Make sure that the latest NVIDIA driver is installed and
running.
More details
I've GEFORCE RTX 2070 on my laptop running Ubuntu 18.04 and had successfully installed its driver from the official runfile NVIDIA-Linux-x86_64-470.63.01.run
. Here is the output of nvidia-smi
from that installation:
Next, I installed cuda-toolkit from the official runfile cuda_11.4.2_470.57.02_linux.run
, making sure to un-select driver installation. Here's the terminal window right after installation completed:
Right after, when I did nvidia-smi
, I get:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA
driver. Make sure that the latest NVIDIA driver is installed and
running.
Since it was cuda-toolkit's installation that presumably "broke" nvidia-smi
, I uninstalled cuda-toolkit (by running cuda-uninstaller
found in /usr/local/cuda-11.4/bin
, as stated in the generated text after installation).
Unfortunately, that doesn't help and nvidia-smi
is still corrupted. The reason I'm installing from official NVIDIA runfiles is because previously I had issues installing driver from Ubuntu repositories, but could make it work with the official driver. So I figured I'd try the same with cuda-toolkit.
How can I get back nvidia-smi
?
Outputs of some commands, if relevant
which nvidia-smi
: /usr/bin/nvidia-smi
mokutil --sb-state
: SecureBoot disabled
nvidia-settings
:
ERROR: NVIDIA driver is not loaded
ERROR: Unable to load info from any available system
ls /sys/firmware/efi/
:
config_table efivars esrt fw_platform_size fw_vendor runtime runtime-map systab vars
lspci -k | grep -EA2 'VGA|3D'
:
00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2
[UHD Graphics] (rev 05)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ae
Kernel driver in use: i915
01:00.0 VGA compatible controller: NVIDIA Corporation TU106M [GeForce RTX 2070 Mobile / > Max-Q Refresh] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ae
Kernel modules: nvidiafb, nouveau
cat /etc/modprobe.d/blacklist-nouveau.conf
:
blacklist nouveau
blacklist vga16b
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
blacklist amd76_edac
alias nouveau off
alias lbm-nouveau off
options nouveau modeset=0
dkms status
: no output
lsmod | grep nvidia
:
echo $XDG_SESSION_TYPE
: x11
whereis nvidia
:
nvidia: /usr/lib/x86_64-linux-gnu/nvidia /usr/lib/nvidia /usr/share/nvidia /usr/src/nvidia-470.63.01/nvidia
grep nvidia /etc/modprobe.d/* /lib/modprobe.d/*
:
/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb
/etc/modprobe.d/blacklist-nouveau.conf:blacklist nvidiafb
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf:# generated by nvidia-installer
/lib/modprobe.d/nvidia-runtimepm.conf:options nvidia "NVreg_DynamicPowerManagement=0x02"
Posts / Questions I've already looked at: