Is there a 100% guaranteed way to setup one's NVidia 4090 for doing AI and not using it for graphics or the desktop? Such that it survives making minor driver upgrades, CUDA upgrades and minor OS upgrades, or just shutting it down for the night, sleeping and rebooting in the morning?
Yesterday, I upgraded to CUDA 12.0 which also upgraded the NVidia driver to 525.60.13. sudo sh cuda_12.0.0_525.60.13_linux.run
.
The upgrade failed on the 525.60.13, so I ran the run script from the emergency single user mode without a desktop. That worked but then I had no audio. This is supposed to be driven through my monitor via the Intel integrated GPU. It was working just before I upgraded the NVidia stuff. Did some inference work awhile without music. Just before shutting down I rebooted and the audio worked. Did more inferencing. Shut down, went to sleep, woke up, start my system and got:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
5.17.0-1019-oem #20-Ubuntu SMP PREEMPT
Obviously I have the latest driver having upgraded a few hours prior.
Yes, I just rebooted again. Yes, I have spent hours Googling. Please try to help without finding fault with the perfection of my question. lshw
sees the device. I've tried so many things.
sudo modprobe -a nvidia
modprobe: ERROR: ../libkmod/libkmod-module.c:838 kmod_module_insert_module() could not find module by name='off'
modprobe: ERROR: could not insert 'off': Unknown symbol in module, or unknown parameter (see dmesg)
Last night this wasn't a problem. :-(