Score:0

What causes "soft lockup" errors and how can I fix it?

cn flag

I've been seeing sporadic messages of the form "BUG: soft lockup - CPU#0 stuck for 22s!" from the System Notifier for several months -- I've had at least three or four kernel version updates in that time period, and the problem appears to be getting worse -- just last night, I had two of my eight cores in this state, resulting in a browser lockup that I couldn't reset by any means I was aware of (had to use the hard reset button on the tower case). When this occurs, the affects core reads at 100% usage on system monitor.

I'm currently running Kubuntu 20.04, repository updates almost daily. Currently on kernel 5.4.077-generic, rebooting on an older kernel version won't help because updates keep only current and one older kernel versions, and I don't even recall precisely how long ago I started getting this (but I think I had it in 16.04 before upgrading last December).

I have an MSI mainboard, AMD FX8350 (8 cores, 8 threads, 4.1 GHz max clock), 16 GB RAM, nVidia GTx750. I have no reports of excessive CPU temperature from monitoring widgets (and I've usually had BOINC software that keeps all eight cores at 100% when I'm not using the computer; recent BOINC or project problems have prevented that for some weeks).

Bottom line question: how can I stop this behavior? Is this a kernel bug (if so, persisting since at least version 4.4.* and running to at least 5.7.*), a hardware problem, or something else?

user10489 avatar
in flag
There are likely multiple possible causes for this, not limited to device driver bugs and hardware failures.
Zeiss Ikon avatar
cn flag
@user10489 So, your answer is that there is no solution and we just have to live with randomly finding hundreds of notifier messages stacked up and potential software lockups? This isn't helping my ongoing argument with my partner about Mac vs. Windows vs. Linux...
user10489 avatar
in flag
You need to diagnose the problem more deeply. Macs can have failing hardware just as easily as linux. I posted as a comment because I don't have a complete answer for this. It may or may not help to post system log messages that preceded the soft lockup, or to run hardware diagnostics.
Zeiss Ikon avatar
cn flag
@user10489 I'm a Windows transplant -- ran Windows from 3.0/1990 until XP/2011 -- and I do everything from the GUI unless I'm entering specific commands I know or copy/pasting commands found in answers, articles, etc. I do *not* know my way around under the hood of Ubuntu -- I can check the oil, so to speak, and that's about it. You're asking the equivalent of bleeding brakes or reporting the heat range of installed spark plugs...
user10489 avatar
in flag
Actually, running hardware diagnostics is not specific to linux per se. If available, a vendor hardware diagnostic (perhaps for your motherboard?) would be ideal. I'm hoping someone else can give you a more clear idea of what to look for in logs.
Zeiss Ikon avatar
cn flag
MSI really makes it, um, interesting to find a diagnostic to download for a 3-4 year old motherboard. As in, register, then find they don't list it, then find their "live chat" doesn't exist until you register the board, which requires opening up the tower to get the serial number off the board. And we're already headed toward "comments are not for extended discussion"...
user10489 avatar
in flag
Perhaps you should ask another question asking for a good hardware diagnostic. Or google around and find a couple and try them. A cpu stress test might be a good start. I'm not surprised MSI doesn't offer one. If they did, it would probably be in the motherboard firmware.
Zeiss Ikon avatar
cn flag
Running BOINC tasks on both CPU (8 cores at 100% for weeks on end) and GPU (all my Cuda cores at maximum for similar times) isn't a stress test?
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.