I recently managed to muck up my CUDA installation beyond my meager ability to repair. So, I decided to purge and reinstall: I followed these steps:
apt clean; apt update; apt purge cuda; apt purge nvidia-*; apt autoremove; apt install cuda
I rebooted, and found things now seem to be working as expected:
$ nvidia-smi
Sat Nov 19 09:08:40 2022
+---------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:02:00.0 On | N/A |
| 59% 44C P8 21W / 370W | 236MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+---------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1720 G /usr/lib/xorg/Xorg 35MiB |
| 0 N/A N/A 2667 G /usr/lib/xorg/Xorg 69MiB |
| 0 N/A N/A 2796 G /usr/bin/gnome-shell 92MiB |
| 0 N/A N/A 3154 G ...AAAAAAAA== --shared-files 23MiB |
+---------------------------------------------+
(base) $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
However, when I try sudo apt-get update
I see the following result:
$ sudo apt-get update
[sudo] password :
Hit:1 https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/amd64 InRelease
But, I'm runing Ubuntu 20.04:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.5 LTS
Release: 20.04
Codename: focal
Why does my 20.04 version of Ubuntu reach for the CUDA matched with Ubuntu 18.04? Should I worry about this?
================== Later Discovery =========================
I found this line in my /etc/apt/sources.list.d/nvidia-container-toolkit.list file
:
deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
I assume this is the source of my problem. Should I:
- Hand edit the 18.04 to 20.04 and hope for the best (doubtful)
- Repeat my nvidia purge, remove the nvidia-container-toolkit.list file and reinstall following NVIDIA's instructions and not Ubuntu's prepared apt packages (probably the best, unless I screw something up)
- Live with my current situation, since it's unlikely anything will really go wrong
- Do something else