Score:3

Debugging slow boot: soft lockup

in flag

Recently my boot time significantly increased. The screen goes black after grub, the monitor turns off and after a couple of minutes (it used to be <10s), the login screen appears. I use Ubuntu 22.04.

In dmesg, I can see:

watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [gpu-manager:879]
$ systemd-analyze blame | head
2min 51.572s gpu-manager.service
2min 17.629s docker.service
1min 42.960s plymouth-quit-wait.service
1min 42.890s postgresql@10-main.service
 1min 8.731s snapd.service
 1min 8.581s containerd.service
     34.403s avahi-daemon.service
     34.401s bluetooth.service
     34.395s NetworkManager.service
     34.386s power-profiles-daemon.service
$ sudo cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sda5 during installation
UUID=6dd5bb0f-d520-4e28-9162-abbfe26b2cc6 /               ext4    errors=remount-ro 0       1
# /boot/efi was on /dev/sda2 during installation
UUID=2082-E229  /boot/efi       vfat    umask=0077      0       1
# /home was on /dev/sdb2 during installation
UUID=9e0c4e6c-7c6b-483d-af13-c87ebf7b5dd5 /home           ext4    defaults        0       2
/swapfile                                 none            swap    sw              0       0

I confirmed that it's the nvidia (proprietary) driver that's chosen in Additional drivers, added nouveau.modeset=0 in grub, after looking online. I also checked the memory. The shutdown is also very slow.

Updating the Nvidia driver from 510 to 515, temporarily resolved it, but a couple of days later the issue is back, now with up to date drivers.

Any suggestions for how to debug this further?

David avatar
cn flag
Please add additional info to the question, not as a comment where it may get missed.
sygi avatar
in flag
It looks like the comment about WiFi was premature.
Score:2
cn flag

If you check the Nvidia developer forums, you can see a few issues related to soft lockups as a result of the newer drivers. https://forums.developer.nvidia.com/search?q=linux%20soft%20lockup%20%20order%3Alatest_topic

I always used to run the latest Nvidia drivers, but I kept encountering issues. I am currently using nvidia-driver-470 which gives me the same performance as nvidia-driver-515, but without multi-monitor issues or soft lockups.

You can see a list of available drivers with apt list nvidia-driver-*

There are also the 520 drivers available here: https://www.nvidia.com/Download/driverResults.aspx/193764/en-us/ but these appear to be having some of the same issues.

sygi avatar
in flag
Thanks for the suggestion, unfortunately downgrading to 470 didn't solve the issue for me.
ThankYee avatar
cn flag
Sorry to hear that. Can you confirm that the nouveau driver is blacklisted? Try `nomodeset` and `modprobe.blacklist=nouveau` cmdline options.
sygi avatar
in flag
yes, I have nouveau blacklisted.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.