Score:1

How do you set amdgpu options

gu flag

OS: Ubuntu 22.04.1

rocminfo is giving an error and I'm trying to get it working.

How do you set amdgpu options? Such as set the option amdgpu cwsr_enable=0. Is there somewhere that lists the options and how to set them and what they do?

muru avatar
us flag
Please see https://askubuntu.com/editing-help on how to format your posts. HTML br tags are not the way to do it here and those just make it harder for other people to fix problems in your post.
Score:2
br flag

Here are some answers to your questions:

Is there somewhere that lists the options and how to set them and what they do?

(Short answer, so I'm placing this first)

modinfo amdgpu

Look for param: in the output. These are all the available parameter options for this kernel module. The Linux kernel documentation also has some good information regarding these:

How do you set amdgpu options?

(Longer answer, because there are many ways)

As we saw above, amdgpu is the name for the Open Source AMD graphics card drivers that exist in the Linux Kernel source tree. They are included with Ubuntu's stock kernel.

Kernel modules (a.k.a. drivers) have parameters which can be set in multiple ways:

  1. Set via Grub Kernel command-line
  • There are two ways to do this depending on whether you want the options to persist across reboots or not.
    1. Temporary method via GRUB command line
      • Start your system and wait for the GRUB menu to show (if you don't see a GRUB menu, press and hold the left Shift key right after starting the system). Some systems use UEFI boot and skip this screen, while others still support the older MBR boot method and do not skip it.
      • At GRUB kernel selection screen, highlight the kernel version entry you want to use.
      • Press e to edit that kernel command line.
      • The line you want to find looks like this: linux /boot/vmlinuz-6.2.0-20-generic ROOT=UUID=1234567-ABCD ro quiet splash
      • Add your kernel options and kernel module options at the end of this line.
      • Kernel-level parameters can be passed directly (e.g. noacpi, nomodeset, etc...)
      • Kernel Module-level parameters can be passed using the name + dot modulename.param syntax: (e.g.: amdgpu.dpm=0, amdgpu.aspm=0, etc...)
    2. Persistent method via generated GRUB config command line
      • Edit the /etc/default/grub file as root (e.g.: sudo vi /etc/default/grub, or sudo nano /etc/default/grub)
      • Find the line with GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
      • Inside the last double-quote, add your Linux kernel boot parameters and/or Module-level parameters.
      • Note: The syntax for Module-level parameters is the same as the Temporary GRUB command line method above. (e.g. amdgpu.dpm=0, amdgpu.aspm=0, etc...)
      • Update Grub: sudo update-grub
      • Reboot and your parameters should now be added every time the kernel boots. (This can be viewed and verified to be the case using the e edit GRUB boot line method as above)
  1. Set via Modprobe Drop-In directory

    • This method is also persistent, and applies slightly later in the boot process when modprobe is loading kernel modules

      • You can not set Kernel-level parameters this way, only Module-level parameters.
      • This only works for Loadable Kernel Modules (not those compiled-in, but as a module. See Gentoo Wiki for details)
      • Note: The syntax for these config files is a bit different, as you do not need the modulename.param syntax here. (See man modprobe.d for full documentation of /etc/modprobe.d Drop-In config file syntax.)
    • Add a new Drop-In config file for your GPU

      • For example, to set both dpm=0 and aspm=0:

        echo 'options amdgpu dpm=0 aspm=0' | sudo tee /etc/modprobe.d/amdgpu-options.conf
        
    • Regenerate the initramfs

        sudo update-initramfs -u -k all
      
    • Reboot!

  2. Loading a Module with Temporary Changes

    • Usually this works for testing temporary changes for plug-and-play devices

    • However, this may not be the ideal method for something such as a GPU which is in use very early in the UEFI -> Kernel boot -> Init boot phases.

    • If your system has an integrated graphics card (e.g. Intel Corporation HD Graphics 630 or similar), this could be helpful when diagnosing or testing kernel module parameters for the secondary GPU.

      sudo modprobe <module_name> [parameter=value]
      
    • Where [parameter=value] represents a list of customized parameters available to that module, and <module_name> would be the name of the kernel module (amdgpu in this case)

    • See more detailed information in RedHat's documentation here

Testing Temporary Kernel Module Parameters on a Dual-GPU System

The last method can be useful when testing a system which has both an integrated GPU, and a secondary PCIe GPU (such as AMD / Nvidia / Intel ARC). It is especially helpful when diagnosing basic card initialization issues, when using VFIO and/or IOMMU, and other use-cases. Note: If in doubt and you're unsure about these more advanced topics, then try one of the other easier methods above first.

To follow this method, you usually need to go into the BIOS of a motherboard (assuming it supports this) and enable the Integrated GPU as the primary / default display GPU. Then, we must boot into Linux and check Kernel log messages in one terminal while unloading the kernel module and resetting the other secondary PCIe GPU in another terminal.

For an AMD secondary GPU using the amdgpu kernel module, the process looks like this:

  1. Open a terminal and run: sudo dmesg -H --nopager --follow

    • Look for messages from your GPU driver (e.g. amdgpu). There may be some helpful error messages to diagnose the issue.
    • It may be helpful to press enter a few times in this terminal to give some spacing so new messages will be easily visible at the end.
  2. Open another terminal and run: sudo rmmod amdgpu (or whichever driver name or kernel module the secondary GPU uses)

    • Check that the module has been unloaded with: sudo lsmod | grep -i amdgpu
    • You should see no output if it is not currently loaded in the kernel.
  3. Find the PCIe bus ID of the secondary GPU:

    • Run: sudo lspci

    • Look for AMD in the output, for example on my system I see:

      $ sudo lspci | grep -i amd
      01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c1)
      02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
      03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c1)
      03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller
      
    • On this system, the AMD RX 6600 shows up on PCI bus ID: 03:00.0

    • Note that internal to the GPU card, there are multiple PCIe ports / switches, and an Intel HDA-based HDMI audio device which we can ignore. (The switches are essentially pass-through to the GPU + Intel HDA sound card. The sound card uses snd_hda_intel Kernel module in this case)

  4. Simulate removal of the PCIe device using the bus ID found above:

    • For example:

      # To reset PCI bus ID: 03:00.0
      echo 1 | sudo tee /sys/bus/pci/devices/0000\:03\:00.0/remove
      
  5. Rescan the PCIe bus, and immediately reload the GPU driver module with params:

      # The semicolon separates two commands and runs them in quick succession
      # The reasoning here is that once you write '1' to 'rescan' via sysfs, the kernel might decide to auto-load the amdgpu module automatically without your specified parameters.
      # As such, sometimes it's best to use /etc/modprobe.d or another method for specifying parameters, although reboots can be slower to test.  
      echo 1 | sudo tee /sys/bus/pci/rescan ; sudo modprobe amdgpu dpm=0 aspm=0
    
  6. Check that the loaded module parameters look set correctly like you intended:

      module=amdgpu; 
      ls /sys/module/$module/parameters/ | while read parameter; do \
        echo -n "Parameter: $parameter --> "; \
        sudo cat /sys/module/$module/parameters/$parameter; \
      done;
    
    • If the settings do not match what you passed to modprobe, the driver may have loaded automatically before your options could be applied.
    • If the modprobe param=foo settings did not work, try using the /etc/modprobe.d/ method for setting the option instead, then retry.
  7. Check your dmesg output in the other terminal.

    • Are the previous errors still there?
    • Anything changed or new since the param value was changed?
  8. Repeat again and tweak parameters until you find something that might solve the issue (or crashes the kernel completely & needs a reboot!)

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.