Score:1

How does one make a GPU in a brand new ubuntu 20.04 VM usable?

in flag

I've been trying all day to have this (v100) GPU working on a new ubuntu VM. I tried installing the drivers and rebooting and also purging/uninstalling everything to do with nvidia but none of these things seem to work.

In particular I ran this specifically:

apt update;
apt install build-essential;

sudo add-apt-repository ppa:graphics-drivers
sudo apt install ubuntu-drivers-common
ubuntu-drivers devices
sudo apt-get install nvidia-driver-460
sudo reboot now

Then sometimes it seems that nvidia-smi is working (as of the writing of this question it wasn't so I wasn't able to copy paste what is said when it works) but when it doesn't work it says this:

(synthesis) miranda9@miranda9:~$ nvidia-smi
Unable to determine the device handle for GPU 0000:00:06.0: Unknown Error

any help is appreciated.

Note I also do not have access to the VMs vmx file so this question and answers are useless/meaningless to me: https://forums.developer.nvidia.com/t/nvidia-smi-reports-unable-to-determine-the-device-handle-for-gpu/46835

In addition I have tried to uninstall everything from nivida and re-install it with:

sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall

then

apt update;
apt install build-essential;

sudo add-apt-repository ppa:graphics-drivers
sudo apt install ubuntu-drivers-common
ubuntu-drivers devices
sudo apt-get install nvidia-driver-460
sudo reboot now

but that doesnt seem to work


More info in case it helps:

(synthesis) miranda9@miranda9:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:        20.04
Codename:       focal

also:

(synthesis) miranda9@miranda9:~$ python
Python 3.9.5 (default, Jun  4 2021, 12:28:51) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
/home/miranda9/miniconda3/envs/synthesis/lib/python3.9/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 101: invalid device ordinal (Triggered internally at  /opt/conda/conda-bld/pytorch_1623448238472/work/c10/cuda/CUDAFunctions.cpp:115.)
  return torch._C._cuda_getDeviceCount() > 0
False

As requested by comment:

# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 SCSI storage controller: XenSource, Inc. Xen Platform Device (rev 01)
00:05.0 System peripheral: XenSource, Inc. Citrix XenServer PCI Device for Windows Update (rev 01)
00:06.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)

another vm:

$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 SCSI storage controller: XenSource, Inc. Xen Platform Device (rev 01)
00:05.0 System peripheral: XenSource, Inc. Citrix XenServer PCI Device for Windows Update (rev 01)
00:06.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)

Resources I've search for help:

ChanganAuto avatar
us flag
In a VM the hardware is virtualized. You aren't using the real Nvidia GPU, the host OS is.
cc flag
Take a look at Google results of nvidia virtual machine gpu passthru
Charlie Parker avatar
in flag
@ubfan1 just to make sure I look in the right place. I need to google `passthru` not `passthrough`? e.g. google `nvidia virtual machine gpu passthru` - right?
cc flag
"passthru" came up as an early choice as I started typing, so I selected that. My GPU's too old for that to work for me, so I didn't check much further.
Charlie Parker avatar
in flag
care to elaborate the downvotes?
Charlie Parker avatar
in flag
@NateT yes I am happy to. See update to question. However, my suspicion is that just removing everything from nvidia and then re-installing it with a reboot should work but my attempts to do that fail.
Irsu85 avatar
cn flag
You need to use PCIe passthrough and 2 phycical gpu's in your computer to make this work. You also need a second monitor connected to the second gpu. For the practical commands and so, try using https://pve.proxmox.com/wiki/PCI(e)_Passthrough
Nate T avatar
it flag
What image did you use for the VM. As in full image name? Downvotes are probably because a VM doesn't have a GPU. I assume that you mean "how to get vm to use host GPU" ? Btw wasn't me I only dv in extreme situations. I'm too poor. XD
Score:0
zw flag

A virtual machine emulates a graphic card, so it should be transparent for the guest system which native card you have on your host system. VMs are for "sharing" resources - as opposed to a real system that has access to its hardware directly. So it will not make sense to install Nvidia drivers on a host system. You can check this out by checking your current drivers in your VM:

inxi -G

(executed in a terminal) will show you a VM/oracle driver, not your native card.

Getting a hi performance graphic output may be reached with tweaks and tricks, but VMs are not meant for work like this....

Charlie Parker avatar
in flag
hi, thanks for the response, it was informative! I do not have access to the host system. I request a VM and I get a VM to use. I can be sudo in it but I am in the VM of course. Why do you think the way I am installing the drivers is not working? What exactly is going wrong in your opinion?
kanehekili avatar
zw flag
OK, so the VM is on a remote host. What does `inxi -G` say on your "remote VM" ? If it does not exist, try with `sudo apt install inxi`
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.