I am trying to use nvidia-container-toolkit
to use the CUDA-enabled GPUs and corresponding NVIDIA drivers from the Docker container or to use CUDA inside the container in any other way without modifying the downloaded Docker image.
From nvidia-smi
I have:
| NVIDIA-SMI 450.156.00 Driver Version: 450.156.00 CUDA Version: 11.0 |
From nvcc --version
:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
From nvidia-container-cli --version
:
cli-version: 1.7.0
lib-version: 1.7.0
build date: 2021-11-30T19:53+00:00
build revision: f37bb387ad05f6e501069d99e4135a97289faf1f
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
I tried to install and use nvidia-container-toolkit
. The installation runs without any issues but I cannot run docker with --gpu all
flag. Using docker run ... -gpu all ...
(where ...
are other flags & image name) results in:
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: write error: /sys/fs/cgroup/devices/docker/713e0b6117367c0b8edd3e0430fc022198a95527e40cdbadf28fea838d6d1247/devices.allow: operation not permitted: unknown.
The only solution I found was to make a privilege container which I am trying to avoid.
When I am trying to avoid using nvidia-container-toolkit
alltogether and install the drivers manually in the Docker container, I get driver mismatch errors. But, even if I solved that, it would mean re-installing drivers every time I have to restart the container, which, of course, I would also prefer to avoid.
Is there any way to solve this issue without making a privileged container?