Score:0

"Bug: soft lockup" - how to reboot/resolve with no physical access?

cu flag
Bee

Our group's server (running Ubuntu 20.04.5 LTS) is currently stuck in a "BUG: soft lockup" endless set of error message (2 out of 88 CPUs are unhappy).

However, the issue of the bug/soft lockup problem isn't what I'm asking about - I want to know if there's any way to escape the error messages/restart the server without physical access. I can't escape the error messages to do anything (during the first few times the error message appeared, I was able to do control C to get back to my bash session, but now it doesn't work either). I can't ssh into the server from a different terminal window (it just hangs), nor can I access it via KVM (just a black window, and it says the status is 'Down').

We do not have physical access to our server - it is kept in a secure building, and if, for example the power goes out and our server turns off, we have to pester the staff there via email to get it turned back on. None of them are responding to me today and I would dearly like to start troubleshooting this issue so that we can actually use our computing resources.

Is there something I can do in order to at least temporarily get out of the endless error messages saying "BUG: soft lockup - CPU#X stuck for 22/23s" so that I can restart the server? (FYI, I have zero CS background; I am merely (and frighteningly) the most computer-literate member of our research group, so, uh, be aware of that.) Thanks.

guiverc avatar
cn flag
Normally if I wanted to reboot quickly; I'd just use SysRq commands direct to linux kernel, but that assumes local access. What features the iLO/iDRAC/?? or whatever your server has varies, and I'm unsure if you'll be able to use SysRq via them (*I always expected my local workstation to respond rather than the remote box, thus never tried*)
Bee avatar
cu flag
Bee
Thanks for the tip - alas, alt-SysRq-b, or alt-SysRq-reisub does nothing but make strange symbols in the window. I doubt it's enabled on our server, but I'll definitely make sure it is in the future...
guiverc avatar
cn flag
the *strange symbols* on the window are problem the terminal/software trying to interpret the SysRq keys locally, and they're not being transmitted thru to the remote server at all. Ubuntu 20.04 LTS had SysRq *enabled* by default on all ISOs I believe (*I'm largely a desktop user, and I know it was enabled on all desktop ISOs*) On keystroke, you'll generally see a text message acknowledging the command & execution of it.. though you didn't specify what kernel stack you're using; and being server it maybe the GA stack (older kernels gave less messages for SysRq)
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.