Score:0

ROCm Outputs Hotplug Support Error When I Start a Single GPU Passthrough VM

in flag

I am using scripts to manage the host-to-guest and vise versa transitions; the repo link is here.

I followed this guide by Niteshade to setup my computer. He has a shortened version of his guide in the video description.

After starting the VM via virsh commands (virsh start win10), dmesg outputs this error in obvious relation to the GPU at the same time that the screen goes black:

[drm:amdgpu_pci_remove [amdgpu]] *ERROR* Hotplug removal is not supported

The full log output after the VM start command is entered is here:

[217103.397008] rfkill: input handler enabled
[217104.397562] Console: switching to colour dummy device 80x25
[217104.404470] [drm:amdgpu_pci_remove [amdgpu]] *ERROR* Hotplug removal is not supported
[217104.405590] [drm] amdgpu: finishing device.
[217104.552833] [drm] psp command (0x2) failed and response status is (0x117)
[217104.552835] [drm] free PSP TMR buffer
[217104.658003] [TTM] Finalizing pool allocator
[217104.697318] [TTM] Finalizing DMA pool allocator
[217104.697348] [TTM] Zone  kernel: Used memory at exit: 0 KiB
[217104.697350] [TTM] Zone   dma32: Used memory at exit: 0 KiB
[217104.697353] [drm] amdgpu: ttm finalized
[217104.697748] vfio-pci 0000:0f:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[217105.018090] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation
[217119.845121] vfio-pci 0000:06:00.0: vfio_ecap_init: hiding ecap 0x19@0x200
[217120.901236] vfio-pci 0000:09:00.0: vfio_ecap_init: hiding ecap 0x1e@0x20c
[217120.929155] vfio-pci 0000:0f:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
[217120.929168] vfio-pci 0000:0f:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0

My system then becomes entirely unresponsive in the graphical sense; the only way to get it to do anything is either type the panic sequence (REISUB) or SSH into it.

This issue seems to be entirely undocumented other than a private AMD Community forum of which I do not have access to.

I'm using a Gigabyte VEGA 56 on a Gigabyte Aorus Master WiFi motherboard. How do I solve the hotplug issue?

I'm using Ubuntu 20.04.3 LTS.

Edit: The full text version of Niteshade's guide that I followed is below

1:28 - Step 1. Update ubuntu or Elementary OS ====================================================== sudo apt-get update -y sudo apt-get upgrade -y

2:21 - Step 2. Update grub loader

Edit Grub:

sudo nano /etc/default/grub

AMD: FIND the line - GRUB_CMDLINE_LINUX_DEFAULT="quiet splash" CHANGE it to - GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on iommu=pt iommu=1 video=efifb:off quiet splash"

INTEL: FIND the line - GRUB_CMDLINE_LINUX_DEFAULT="quiet splash" CHANGE it to - GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on iommu=pt iommu=1 video=efifb:off quiet splash"

sudo update-grub

sudo reboot

Once rebooted, check that the grub loader loaded the paramaters with:

sudo cat /proc/cmdline

it should look similar to:

BOOT_IMAGE=/boot/vmlinuz-5.4.0-60-generic root=UUID=0587b30a-06cf-4df2-82fe-fb8db547e1c5 ro amd_iommu=on iommu=pt iommu=1 video=efifb:off quiet splash vt.handoff=1

5:39 - Step 3. Find Your GPU bus address and its audio component ====================================================== Now you need to find you gpu pci address run the following command:

lspci -nnk

you will have a large output to the terminal, look for your GPU details, mine look like this: Note the address mine is in bold 06.00.0 and 06.00.1

06:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X] [1002:67df] (rev e7)

Subsystem: XFX Pine Group Inc. Ellesmere [Radeon RX 470/480/570/580] [1682:c580] Kernel driver in use: amdgpu Kernel modules: amdgpu 06:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 580] [1002:aaf0] Subsystem: XFX Pine Group Inc. Ellesmere [Radeon RX 580] [1682:aaf0] Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel

6:42 - Step 4: Install Virtualization Software

sudo apt install qemu-kvm libvirt-clients libvirt-daemon-system bridge-utils virt-manager ovmf

7:44 - Step 5: Configure Libvirt ====================================================== Install the virtualization software:

sudo apt install qemu-kvm libvirt-clients libvirt-daemon-system bridge-utils virt-manager ovmf

Next update the libvirt configuration:

sudo nano /etc/libvirt/libvirtd.conf

find each of these in the file, or add them if they are not there:

#unix_sock_group = "libvirt" #unix_sock_rw_perms = "0770"

#log_filters="1:qemu" #log_outputs="1:file:/var/log/libvirt/libvirtd.log"

change to:

unix_sock_group = "libvirt" unix_sock_rw_perms = "0770"

log_filters="1:qemu" log_outputs="1:file:/var/log/libvirt/libvirtd.log"

if it is not in the file, simply add it. If it is not commented out with #, then just leave it as it is and exit the file.

Now run the following commands:

sudo usermod -a -G libvirt $(whoami) sudo systemctl start libvirtd sudo systemctl enable libvirtd

11:03 - Step 6: Configure Qemu

edit:

sudo nano /etc/libvirt/qemu.conf

find:

#user = "root" #group = "root"

change to:

user = "YOUR USERNAME" group = "YOUR USERNAME"

Restart Libvirt:

sudo systemctl restart libvirtd

sudo usermod -a -G kvm "YOUR USERNAME" sudo usermod -a -G libvirt "YOUR USERNAME"

13:09 - Step 7: Create VM

Open Virtual-Manager

sudo virt-manager

During the setup of the Vm - choose the option to edit the VM before installation

In Overview:

  • set chipset to Q35
  • set Bios to EUFI

In Boot:

  • Enable boot manager

Ensure there are no IDE drives before contuing. Then install windows as expected.

20:40 - Step 8: Add GPU/Mouse and keyboard as passthrough

23:59 - Step 9: Setup hooks

sudo apt-get install git

sudo git clone https://gitlab.com/risingprismtv/sing...

26:19 - Step 10: Setup GPU ROM File

Website to get ROM files: https://www.techpowerup.com/vgabios/

33:45 - Step 11: Start virtual machine

waltinator avatar
it flag
Does something FAIL after this message, or is your problem that the message offends? Are you using Ubuntu? Which release?
waltinator avatar
it flag
Telling us which remote procedure (RP) you "followed" doesn't help us help you for N reasons: 1) It's remote. Will the link exist tomorrow? 2) Reading the RP doesn't tell us how accurately you "followed" it. Did you suffer typos or missed lines? We have. 3) Reading the RP omits the error messages you got on your system. These error messages (and the commands that caused them) are key elements in any diagnosis.
mncraftgeek avatar
in flag
@waltinator The screen goes black and anything other than an SSH connection or panic key sequence does not get any sort of visual response from the computer. I did make sure that I followed his guide exactly, and have verified that there are no typos. How do I read the RP to get the error messages?
mncraftgeek avatar
in flag
I'm using Ubuntu 20.04.3 LTS. The command that I entered before the log output I provided is `virsh win10 start`
waltinator avatar
it flag
Comments are intended for US to ask YOU questions about your Question. You should [Edit] your question to add information. By updating your Question, and using the formatting buttons, you make all the information available to new readers. People shouldn't have to read a long series of comments to get the whole story.
mncraftgeek avatar
in flag
I fixed the question to reflect what we discussed in the comments so far.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.