Score:0

Why am I getting ACPI BIOS Errors each time I run nvidia-smi?

in flag

Each time I run nvidia-smi on our new compute system I get this type of errors in syslog. Often a few of them in a group:

Feb 25 13:35:02 xxxx kernel: [77419.656602] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20210331/dsfield-184)
Feb 25 13:35:02 xxxx kernel: [77419.656612] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20210331/dswload2-477)
Feb 25 13:35:02 xxxx kernel: [77419.656616]
Feb 25 13:35:02 xxxx kernel: [77419.656618] No Local Variables are initialized for Method [_DSM]
Feb 25 13:35:02 xxxx kernel: [77419.656618]
Feb 25 13:35:02 xxxx kernel: [77419.656619] Initialized Arguments for Method [_DSM]:  (4 arguments defined for method invocation)
Feb 25 13:35:02 xxxx kernel: [77419.656620]   Arg0:   000000007cd03195 <Obj>           Buffer(16) 75 0B A5 D4 C7 65 F7 46
Feb 25 13:35:02 xxxx kernel: [77419.656628]   Arg1:   0000000012ece7a2 <Obj>           Integer 0000000000000102
Feb 25 13:35:02 xxxx kernel: [77419.656632]   Arg2:   000000009179cfcc <Obj>           Integer 0000000000000010
Feb 25 13:35:02 xxxx kernel: [77419.656635]   Arg3:   000000002ecdce5a <Obj>           Buffer(4) 00 10 52 44
Feb 25 13:35:02 xxxx kernel: [77419.656639]
Feb 25 13:35:02 xxxx kernel: [77419.656641] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20210331/psparse-529)

The same happens when an snmpd process periodically queries the GPU parameters.

Any ideas why would this be?

The output of nvidia-smi seems to be correct, but I'm a bit puzzled if those syslog errors would matter. I have updated BIOS with the latest version that is only a few days old. Here is the information about the system in question:

$ inxi -Fxz
System:    Kernel: 5.13.0-30-generic x86_64 bits: 64 compiler: N/A Console: tty 0 Distro: Ubuntu 20.04.4 LTS (Focal Fossa)
Machine:   Type: Desktop System: Alienware product: Alienware Aurora R13 v: N/A serial: <filter>
           Mobo: Alienware model: 0C92D0 v: A00 serial: <filter> UEFI: Alienware v: 1.0.12 date: 01/25/2022
CPU:       Topology: 10-Core model: 12th Gen Intel Core i7-12700KF bits: 64 type: MT MCP arch: N/A L2 cache: 25.0 MiB
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 144383
           Speed: 893 MHz min/max: 800/6300 MHz Core speeds (MHz): 1: 890 2: 900 3: 843 4: 891 5: 800 6: 818 7: 873 8: 894
           9: 958 10: 925 11: 909 12: 900 13: 891 14: 901 15: 881 16: 909 17: 891 18: 1182 19: 884 20: 913
Graphics:  Device-1: NVIDIA vendor: Dell driver: nvidia v: 510.47.03 bus ID: 01:00.0
           Display: server: X.org 1.20.13 driver: fbdev,nouveau unloaded: modesetting,vesa tty: 136x50
           Message: Advanced graphics data unavailable in console. Try -G --display
Audio:     Device-1: Intel vendor: Dell driver: snd_hda_intel v: kernel bus ID: 00:1f.3
           Device-2: NVIDIA vendor: Dell driver: snd_hda_intel v: kernel bus ID: 01:00.1
           Sound Server: ALSA v: k5.13.0-30-generic
Network:   Device-1: Realtek vendor: Bigfoot Networks driver: r8169 v: kernel port: 3000 bus ID: 03:00.0
           IF: enp3s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
           Device-2: Intel vendor: Bigfoot Networks driver: iwlwifi v: kernel port: 3000 bus ID: 04:00.0
           IF: wlp4s0 state: down mac: <filter>
           IF-ID-1: docker0 state: up speed: 10000 Mbps duplex: unknown mac: <filter>
           IF-ID-2: veth4f6068a state: up speed: 10000 Mbps duplex: full mac: <filter>
Drives:    Local Storage: total: 1.84 TiB used: 131.29 GiB (7.0%)
           ID-1: /dev/nvme0n1 model: KXG70ZNV1T02 NVMe KIOXIA 1024GB size: 953.87 GiB
           ID-2: /dev/sda vendor: Toshiba model: DT01ACA100 size: 931.51 GiB temp: 35 C
Partition: ID-1: / size: 904.82 GiB used: 131.20 GiB (14.5%) fs: ext4 dev: /dev/nvme0n1p2
           ID-2: swap-1 size: 11.00 GiB used: 65.2 MiB (0.6%) fs: swap dev: /dev/nvme0n1p3
Sensors:   System Temperatures: cpu: 32.0 C mobo: N/A
           Fan Speeds (RPM): N/A
Info:      Processes: 456 Uptime: 21h 41m Memory: 62.60 GiB used: 2.92 GiB (4.7%) Init: systemd runlevel: 5 Compilers:
           gcc: 9.3.0 Shell: bash v: 5.0.17 inxi: 3.0.38

The GPU is NVIDIA RTX 3080 10GB. The system is deployed in a server room without monitor, no mouse, no keyboard. The messages show the same way even if I connect monitor/mouse/keyboard. No difference.

I tried to find more information about this problem but no luck. I'm not even sure if it is important to try fixing this or who should I report it to in case it is a genuine bug.

-- Bogdan

in flag
Very simple. I'd like to understand why I'm seeing the above mentioned errors in the syslog that seem to happen when the GPU is being accessed. If it is important how do I fix it, if not, how do I make them go away (get them to stop showing).
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.