Score:0

NVIDIA docker, Ubuntu image does not recognize GPU

id flag
R M

On Ubuntu-based NVIDIA docker image, container does NOT recognize GPU but Redhat-based container does. Why? I followed the official installation manual and used the official docker image. Should I ask about it to NVIDIA?

Environment

  • Ubuntu Desktop 22.04 LTS
  • Docker 20.10.21
  • GPU RTX 2080
  • Driver nvidia-driver-510
  • No CUDA installed on host OS

Command

# Ubuntu cuda11.8
$ docker run --gpus all -it --rm nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04 /bin/bash
$ nvidia-smi
$ nvcc -V
bash: nvcc: command not found

$ apt-get update
$ apt-get install -y python3 python3-pip
$ pip3 install torch torchvision
$ python3
Python 3.10.6 (main, Aug 10 2022, 11:40:04) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:88: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
False

# Ubuntu cuda11.6
$ docker run --gpus all -it --rm nvidia/cuda:11.6.1-cudnn8-runtime-ubuntu20.04 /bin/bash
$ nvidia-smi
$ nvcc -V
bash: nvcc: command not found


# Redhat cuda11.6
$ docker run --gpus all -it --rm nvidia/cuda:11.6.1-cudnn8-devel-ubi8 /bin/bash
$ nvidia-smi
$ nvcc -V
$ yum install python38
$ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
$ python3 get-pip.py
$ pip install torch torchvision
$ python3

Python 3.8.12 (default, Sep 16 2021, 10:46:05) 
[GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
True
>>>

Ref

Score:0
ph flag
rok

You are using different images between Ubuntu and Redhat. On Redhat you are using the devel image (see in the image name), which includes development tools such as nvcc. The latter is not included in the runtime images, that's why you get "command not found" error. I think nvidia container toolkit should be also installed on the host machine.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.