Score:0

Unreliable graphics behavior after upgrade to Ubuntu 22.04

pl flag

I've been stumped by this all week and am finally giving up on antisocial lone-debugging. Hopefully someone can help me make sense of this behavior! I originally made a bug in the nvidia-graphics-drivers-535 component (https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-535/+bug/2031198) but after more experimentation, I'm not convinced that's the right place for the bug any more.

System info: Ubuntu: 22.04

kernel: 6.2.0-26

video driver: nvidia-driver-535

Window system: X11 - no wayland for any of the experiments.

Alienware 15 R3 laptop,

Nvidia Geforce GTX 1060 + Intel HD Graphics 630

Full specs @ https://dl.dell.com/content/manual38481186-alienware-15-r3-setup-and-specifications.pdf?language=en-us

Current Behavior:

I dual boot to Win10, which has no graphics issues. So I don't think some video hardware has failed/is in the process of failing.

The behavior is different depending on whether nvidia-drm.modeset=0 or nvidia-drm.modeset=1 is set on boot.

  • In all cases, the spash kernel argument is supplied but the plymouth splash screen does not display.

Booting with nvidia-drm.modeset=0:

  • The login screen is a sea of blackness, but the backlight is on.
  • At this point, If I try to go to another virtual terminal with ctrl+alt+Fx (where x > 1), I can see the tail end of the kernel log, but it is not a usable terminal. There's no login prompt and nothing shows up on typing.
    • I can get back to VT1 where the login screen should be, and it's still a black screen with backlight on.
  • Back on VT1, even though there's no login screen displayed I can pretend the login screen is there: (e.g.:hit enter to select default user + type pw + enter)
    • Xorg/gnome starts up and I can use the desktop GUI as if nothing was wrong!
  • From gnome, I can swap back to the login screen with ctrl+alt+F1 and this time it appears just fine! And I can go back to the VT with X11 which continues displaying a gnome environment as one would expect. However, trying to use other VTs gives the same result as above.
  • Instead of logging in immediately at the black login screen, If I don't login immediately, but wait long enough (screen sleep timeout?)
    • the backlight eventually turns off
    • the "just pretend there's a login screen there" workaround outlined above does not work
    • Foreshadowing:
      • the fans remain in a steady state
      • top doesn't show anything taking up a suspicious amount of cpu
      • nothing is logged to dmesg

The behavior with nvidia-drm.modeset=1 is quite erratic, and I've identified 3 different behaviors so far - just by rebooting a bunch without changing anything between boots.

Booting with nvidia-drm.modeset=1 Behavior #0:

  • The login screen does not display, and the backlight is off.
  • After a few moments the fans go crazy. Something seems to be working hard, but I can't see it :(.
  • I can ssh to the laptop at this point though,
    • top shows that plymouthd is taking up 100% cpu.
    • dmesg shows that plymouthd is blocked waiting on nvidia_modeset. This message repeats forever:
[   60.327875] watchdog: BUG: soft lockup - CPU#7 stuck for 53s! [plymouthd:329]
[   60.327880] Modules linked in: rfcomm ccm snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic nvidia_uvm(POE) cmac algif_hash algif_skcipher af_alg bnep intel_tcc_cooling nvidia_drm(POE) x86_pkg_temp_thermal intel_powerclamp nvidia_modeset(POE) coretemp kvm_intel kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd snd_soc_avs nvidia(POE) snd_soc_hda_codec snd_hda_ext_core snd_soc_core snd_hda_codec_hdmi binfmt_misc snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec rapl hid_generic mei_hdcp mei_pxp intel_rapl_msr nls_iso8859_1 snd_hda_core snd_hwdep ath10k_pci i915 ath10k_core snd_pcm ath snd_seq_midi snd_seq_midi_event drm_buddy mac80211 ttm uvcvideo snd_rawmidi dell_wmi videobuf2_vmalloc drm_display_helper videobuf2_memops snd_seq btusb processor_thermal_device_pci_legacy cec btrtl intel_cstate videobuf2_v4l2 joydev input_leds btbcm snd_seq_device
[   60.327967]  processor_thermal_device rc_core processor_thermal_rfim cfg80211 snd_timer btintel videodev dell_smbios dcdbas btmtk drm_kms_helper usbhid videobuf2_common processor_thermal_mbox i2c_algo_bit dell_wmi_descriptor ledtrig_audio intel_wmi_thunderbolt wmi_bmof hid serio_raw mc bluetooth mxm_wmi snd processor_thermal_rapl mei_me intel_rapl_common syscopyarea sysfillrect libarc4 soundcore ecdh_generic ee1004 mei sysimgblt intel_soc_dts_iosf ecc intel_pch_thermal int3403_thermal int340x_thermal_zone mac_hid intel_hid int3400_thermal acpi_pad acpi_thermal_rel sparse_keymap sch_fq_codel msr parport_pc ppdev lp ramoops parport reed_solomon pstore_blk pstore_zone drm efi_pstore ip_tables x_tables autofs4 nvme ahci nvme_core xhci_pci i2c_i801 alx crc32_pclmul psmouse i2c_smbus nvme_common libahci mdio xhci_pci_renesas video wmi
[   60.328027] CPU: 7 PID: 329 Comm: plymouthd Tainted: P           OEL     6.2.0-26-generic #26~22.04.1-Ubuntu
[   60.328029] Hardware name: Alienware Alienware 15 R3/Alienware 15 R3, BIOS 1.10.0 07/21/2020
[   60.328030] RIP: 0010:_nv001596kms+0x0/0x80 [nvidia_modeset]
[   60.328065] Code: 48 48 8b 53 28 e9 e5 fd ff ff 45 31 c0 e9 e6 fc ff ff 49 c7 44 24 48 00 00 00 00 48 8b 53 28 e9 96 fd ff ff 66 0f 1f 44 00 00 <f3> 0f 1e fa 55 48 89 e5 41 55 49 89 fd 41 54 49 89 f4 53 48 8d 5f
[   60.328067] RSP: 0018:ffffa7b040433558 EFLAGS: 00000282
[   60.328069] RAX: ffffffffc56efce0 RBX: ffff92cccf488208 RCX: ffff92cccdcd68c8
[   60.328070] RDX: ffff92ccc394cc08 RSI: ffff92ccc394cc08 RDI: ffff92cccf488208
[   60.328071] RBP: ffffa7b0404335a0 R08: 0000000000000000 R09: 0000000000000000
[   60.328072] R10: 0000000000000000 R11: 0000000000000000 R12: ffff92ccc394cc08
[   60.328073] R13: ffff92ccce142008 R14: ffff92ccce142168 R15: 0000000000000000
[   60.328075] FS:  00007fbe874c7440(0000) GS:ffff92d42edc0000(0000) knlGS:0000000000000000
[   60.328076] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   60.328077] CR2: 00007f47e02f3000 CR3: 0000000117ad8005 CR4: 00000000003706e0
[   60.328079] Call Trace:
[   60.328080]  <TASK>
[   60.328081]  ? _nv001165kms+0x82/0x3a0 [nvidia_modeset]
[   60.328112]  ? nvkms_call_rm+0x5d/0x90 [nvidia_modeset]
[   60.328128]  _nv002331kms+0x145/0x210 [nvidia_modeset]
[   60.328152]  _nv000529kms+0x160/0x1b0 [nvidia_modeset]
[   60.328173]  _nv002766kms+0x4bf6/0x4cd0 [nvidia_modeset]
[   60.328198]  ? _nv000355kms+0x100/0x100 [nvidia_modeset]
[   60.328213]  nvKmsIoctl+0xf9/0x270 [nvidia_modeset]
[   60.328228]  ? _raw_spin_lock_irqsave+0xe/0x20
[   60.328232]  nvkms_ioctl_from_kapi+0x6e/0xd0 [nvidia_modeset]
[   60.328247]  _nv000019kms+0x368/0x890 [nvidia_modeset]
[   60.328272]  ? nvkms_free+0x26/0x30 [nvidia_modeset]
[   60.328287]  ? _nv000019kms+0x388/0x890 [nvidia_modeset]
[   60.328313]  nv_drm_atomic_apply_modeset_config.isra.0+0x2f1/0x520 [nvidia_drm]
[   60.328320]  ? nv_drm_atomic_apply_modeset_config.isra.0+0x401/0x520 [nvidia_drm]
[   60.328327]  nv_drm_atomic_commit+0xba/0x350 [nvidia_drm]
[   60.328333]  ? drm_atomic_check_only+0x1ad/0x400 [drm]
[   60.328360]  drm_atomic_commit+0x96/0xd0 [drm]
[   60.328379]  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
[   60.328408]  nv_drm_atomic_helper_disable_all+0x23d/0x310 [nvidia_drm]
[   60.328414]  nv_drm_master_drop+0x28/0x70 [nvidia_drm]
[   60.328419]  drm_dropmaster_ioctl+0xe4/0x160 [drm]
[   60.328439]  ? __pfx_drm_dropmaster_ioctl+0x10/0x10 [drm]
[   60.328459]  drm_ioctl_kernel+0xc0/0x160 [drm]
[   60.328488]  ? raw_spin_rq_unlock+0x10/0x40
[   60.328492]  drm_ioctl+0x27b/0x4c0 [drm]
[   60.328521]  ? __pfx_drm_dropmaster_ioctl+0x10/0x10 [drm]
[   60.328541]  ? schedule+0x68/0x110
[   60.328545]  nv_drm_ioctl+0x48/0x3a0 [nvidia_drm]
[   60.328552]  __x64_sys_ioctl+0x9a/0xe0
[   60.328555]  do_syscall_64+0x59/0x90
[   60.328558]  ? syscall_exit_to_user_mode+0x2a/0x50
[   60.328560]  ? do_syscall_64+0x69/0x90
[   60.328561]  ? exit_to_user_mode_prepare+0x3b/0xd0
[   60.328564]  ? syscall_exit_to_user_mode+0x2a/0x50
[   60.328566]  ? do_syscall_64+0x69/0x90
[   60.328568]  ? exit_to_user_mode_prepare+0x3b/0xd0
[   60.328570]  ? syscall_exit_to_user_mode+0x2a/0x50
[   60.328572]  ? do_syscall_64+0x69/0x90
[   60.328574]  ? do_syscall_64+0x69/0x90
[   60.328575]  ? syscall_exit_to_user_mode+0x2a/0x50
[   60.328577]  ? do_syscall_64+0x69/0x90
[   60.328579]  ? do_syscall_64+0x69/0x90
[   60.328581]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[   60.328584] RIP: 0033:0x7fbe8731aaff
[   60.328586] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[   60.328587] RSP: 002b:00007ffe0f929010 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   60.328589] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbe8731aaff
[   60.328590] RDX: 0000000000000000 RSI: 000000000000641f RDI: 000000000000000b
[   60.328591] RBP: 000000000000641f R08: 000055e418227180 R09: 0000000000000000
[   60.328593] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
[   60.328594] R13: 000000000000000b R14: 000055e41820a3e0 R15: 000055e418227180
[   60.328596]  </TASK>
  • If I try to go to another virtual terminal with ctrl+alt+Fx I see nothing. The backlight remains off, and I cannot see any dim image by pointing a flashlight at the screen. So it's not just the backlight being off.

Booting with nvidia-drm.modeset=1 Behavior #1:

  • The login screen appears, ui elements work as expected.

  • Virtual Terminals at this stage are different than in the nvidia-drm.modeset=0 case:

    • Hitting ctrl+alt+Fx (where x > 1) goes to a black screen with backlight off
    • I cannot get back to VT1 this time. (or if I can, I can't see that i have, because the backlight remains off).
  • When I attempt a login and enter credentials immediately, the screen goes black with no backlight.

    • After a few moments the fans go crazy. Something seems to be working hard, again.
    • I can ssh to the laptop at this point, and top shows that Xorg is taking up 100% cpu.
    • dmesg shows that Xorg is blocked waiting on nvidia_modeset. This error is repeated forever:
[  140.254729] watchdog: BUG: soft lockup - CPU#2 stuck for 86s! [Xorg:1059]
[  140.254735] Modules linked in: rfcomm ccm snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic nvidia_uvm(POE) cmac algif_hash algif_skcipher af_alg bnep intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic nvidia_drm(POE) ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd nvidia_modeset(POE) cryptd binfmt_misc nls_iso8859_1 rapl snd_soc_avs snd_soc_hda_codec snd_hda_ext_core nvidia(POE) hid_generic mei_hdcp mei_pxp intel_rapl_msr snd_soc_core snd_compress ac97_bus snd_hda_codec_hdmi snd_pcm_dmaengine i915 ath10k_pci snd_hda_intel ath10k_core snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec ath snd_hda_core uvcvideo snd_hwdep drm_buddy videobuf2_vmalloc ttm videobuf2_memops snd_pcm videobuf2_v4l2 snd_seq_midi joydev intel_cstate mac80211 snd_seq_midi_event drm_display_helper input_leds snd_rawmidi videodev usbhid btusb dell_wmi btrtl snd_seq cec dell_smbios btbcm dcdbas btintel btmtk
[  140.254841]  videobuf2_common snd_seq_device ledtrig_audio intel_wmi_thunderbolt mxm_wmi wmi_bmof ee1004 dell_wmi_descriptor serio_raw hid mc bluetooth snd_timer processor_thermal_device_pci_legacy cfg80211 rc_core processor_thermal_device snd ecdh_generic processor_thermal_rfim ecc drm_kms_helper libarc4 soundcore processor_thermal_mbox i2c_algo_bit syscopyarea processor_thermal_rapl mei_me intel_rapl_common sysfillrect intel_pch_thermal sysimgblt mei intel_soc_dts_iosf int3403_thermal int340x_thermal_zone intel_hid int3400_thermal mac_hid sparse_keymap acpi_thermal_rel acpi_pad sch_fq_codel msr parport_pc ppdev lp ramoops parport reed_solomon pstore_blk pstore_zone drm efi_pstore ip_tables x_tables autofs4 nvme ahci nvme_core i2c_i801 alx xhci_pci crc32_pclmul psmouse i2c_smbus nvme_common mdio libahci xhci_pci_renesas video wmi
[  140.254885] CPU: 2 PID: 1059 Comm: Xorg Tainted: P           OEL     6.2.0-26-generic #26~22.04.1-Ubuntu
[  140.254887] Hardware name: Alienware Alienware 15 R3/Alienware 15 R3, BIOS 1.10.0 07/21/2020
[  140.254889] RIP: 0010:_nv001596kms+0x0/0x80 [nvidia_modeset]
[  140.254924] Code: 48 48 8b 53 28 e9 e5 fd ff ff 45 31 c0 e9 e6 fc ff ff 49 c7 44 24 48 00 00 00 00 48 8b 53 28 e9 96 fd ff ff 66 0f 1f 44 00 00 <f3> 0f 1e fa 55 48 89 e5 41 55 49 89 fd 41 54 49 89 f4 53 48 8d 5f
[  140.254925] RSP: 0018:ffffab6d43513a00 EFLAGS: 00000282
[  140.254927] RAX: ffffffffc55edce0 RBX: ffff8aeb044a0e08 RCX: ffff8aeb14df7608
[  140.254928] RDX: ffff8aeb0434c808 RSI: ffff8aeb0434c808 RDI: ffff8aeb044a0e08
[  140.254929] RBP: ffffab6d43513a48 R08: 0000000000000000 R09: 0000000000000000
[  140.254930] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8aeb0434c808
[  140.254931] R13: ffff8aeb049c2808 R14: ffff8aeb049c2968 R15: 0000000000000000
[  140.254933] FS:  00007f528f02ba80(0000) GS:ffff8af26ec80000(0000) knlGS:0000000000000000
[  140.254934] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  140.254935] CR2: 00007ffd06139000 CR3: 0000000118a98004 CR4: 00000000003706e0
[  140.254937] Call Trace:
[  140.254938]  <TASK>
[  140.254939]  ? _nv001165kms+0x82/0x3a0 [nvidia_modeset]
[  140.254970]  ? nvkms_call_rm+0x5d/0x90 [nvidia_modeset]
[  140.254985]  _nv002331kms+0x145/0x210 [nvidia_modeset]
[  140.255010]  _nv000529kms+0x160/0x1b0 [nvidia_modeset]
[  140.255030]  _nv002766kms+0x4bf6/0x4cd0 [nvidia_modeset]
[  140.255055]  ? _nv000355kms+0x100/0x100 [nvidia_modeset]
[  140.255070]  nvKmsIoctl+0xf9/0x270 [nvidia_modeset]
[  140.255084]  ? _raw_spin_lock_irqsave+0xe/0x20
[  140.255088]  nvkms_ioctl+0x121/0x190 [nvidia_modeset]
[  140.255103]  nvidia_frontend_unlocked_ioctl+0x55/0xa0 [nvidia]
[  140.255352]  __x64_sys_ioctl+0x9a/0xe0
[  140.255356]  do_syscall_64+0x59/0x90
[  140.255359]  ? handle_mm_fault+0x119/0x330
[  140.255362]  ? lock_mm_and_find_vma+0x44/0x250
[  140.255364]  ? do_user_addr_fault+0x1d0/0x640
[  140.255367]  ? exit_to_user_mode_prepare+0x3b/0xd0
[  140.255370]  ? irqentry_exit_to_user_mode+0x9/0x20
[  140.255372]  ? irqentry_exit+0x43/0x50
[  140.255374]  ? exc_page_fault+0x92/0x1b0
[  140.255376]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  140.255379] RIP: 0033:0x7f528f31aaff
[  140.255381] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[  140.255383] RSP: 002b:00007ffd061336e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  140.255385] RAX: ffffffffffffffda RBX: 00000000c0106d00 RCX: 00007f528f31aaff
[  140.255386] RDX: 00007ffd06133740 RSI: 00000000c0106d00 RDI: 0000000000000013
[  140.255387] RBP: 00007ffd06133740 R08: 0000000000000000 R09: 00005654791362c0
[  140.255388] R10: 00007ffd0614aab0 R11: 0000000000000246 R12: 0000000000000013
[  140.255389] R13: 00007f528ea1cbc8 R14: 00007ffd06136048 R15: 0000000000000003
[  140.255392]  </TASK>
  • If I don't login immediately, but instead just wait long enough past the screen dim event on the login screen (screen sleep timeout?), the login screen disappears, the backlight turns off, and the fans go nuts.
    • top shows that Xorg is taking up 100% cpu.
    • dmesg shows that Xorg is blocked waiting on nvidia_modeset, with the same repeated soft lockup call trace as above.

Booting with nvidia-drm.modeset=1 Behavior #2: Super rare; this has only happened once, so far.

  • The login screen appears, ui elements work as expected.
  • Upon logging in, Xorg/gnome starts up and I can use the desktop GUI as if nothing was wrong!
  • From gnome, I can swap back to the login screen with ctrl+alt+F1, and from there can swap back to my logged in gnome session with ctrl+alt+F2.
  • If I try to go to another virtual terminal with ctrl+alt+Fx (where x > 2), I can see the tail end of the kernel log, but it is not a usable terminal. There's no login prompt and nothing shows up on typing.

Things I've Tried

  • adding nomodeset to the kernel args
    • Result: No effect.
  • switching to nvidia-driver 525
    • Result: No effect.
  • uninstalling nvidia packages (apt remove --purge '*nvidia*' and apt autoremove) and attempting to use the nouveau driver
    • Result: On boot, the screen was black with backlight on. loginctl reported that whatever was there was running on wayland, unlike all the experiments above, where it reports x11. I was unable to use the "just pretend there's a login screen there" trick described above. Dunno if that's because the wayland login screen is different or if there simply wasnt one this time.

Before noticing the nvidia-drm.modeset pattern, or that sometimes i can log in even if there's no login screen displayed, I also tried all this

  • switching to nvidia-driver 470, and 390 (non-server variants)
    • Result: No graphics, backlight or ability to switch to another VT
  • making Intel graphics primary with nvidia-prime
    • Result: No graphics, backlight or ability to switch to another VT
  • Disabled wayland in /etc/gdm3/custom.conf
    • Result: No graphics, backlight or ability to switch to another VT
  • Reinstalling ubuntu completely (this required using safe graphics mode - normal mode yielded the same behavior in the installer)

Pre-upgrade Behavior

I was using the gnome GUI and the proprietary nvidia drivers with this laptop for years. I never had a problem with the plymouth splash screen, login screen, or gnome desktop. But...

Disclaimer - I don't know much about the state before the upgrade. It happened overnight while I was not present. Sadly I don't know the ubuntu version (Probably 20.04?), nvidia driver version, if it was wayland or X11, or the kernel version. I wish I had this info for you :(. All that history was trashed when I reinstalled Ubuntu 22.04 on top of itself in a fruitless attempt to get this working :(

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.