Score:2

Ubuntu server 22.04 suspend issues

je flag

I'm running a headless Ubuntu server as a media server. As I only rarely need access to the data, I'm suspending the system when it's idle. The command systemctl suspend is used to suspend the system.

This worked fairly well with Ubuntu 20.04 LTS but since upgrading to 22.04 LTS suspense is not working at all. Note: Before upgrade the problem I'm describing here happened rarely as well.

The problem I'm facing: Once the command is run, the screen turns black (if one is attached) and/or the ssh sessions terminate. The HDDs seem to power down, but the system fans keep rotating. The server then becomes unresponsive and does not react to WOL commands or even the power button. I cannot wake up the system anymore. As a last resort, I have to press the power button for 5 seconds to cut the power and restart the server.

After digging through a lot of documentation I now don't know how to further debug the issue. Is there any debugging procedure I can follow? Any logs I can check for errors? Anyone who experienced similar issues?

Some more details:

  • The system is running Ubuntu 22.04 LTS with the kernel version 5.15.0-76-generic
  • S3 suspension/STR is activated in the BIOS
  • After countless tries, one time it actually worked and suspended correctly, so it seems to be possible, but is just up to luck at the moment it seems
  • Changing to an older kernel (5.4.0-153-generic) as suggested in some threads and done as described here prevented the system from booting so I had to revert

Edit 1: As requested I posted the output of journalctl --grep='suspend|sleep' --no-pager --since="-1hour" here: https://pastebin.com/izdBMWmv

Edit 2: More output of the journalctl command after removing the autosuspend systemd service: https://pastebin.com/T5fwcfpK

Edit 3: Output of the command for s in "0000:03:00" "0000:05:00"; do lspci -nnk -s "$s"; done: https://pastebin.com/eEUtBfcM

Raffa avatar
jp flag
Please [edit] your question to add the output of `journalctl --grep='suspend|sleep' --no-pager --since="-1week"` assuming this happened at least once in the last 7 days.
Fgop avatar
je flag
I edited the question above to provide the requested logs. They contain two successful and one subsequent unsuccessful suspend command.
Raffa avatar
jp flag
You seem to have altered the default system sleep/suspend services as well as installed 3rd party applications ... See `autosuspend[1292]: File "/opt/autosuspend/bin/autosuspend", line 5, in <module>` in your linked logs ... Tell us about all you did and how you did it, please.
Fgop avatar
je flag
It is third party software to monitor network traffic and auto suspend the system if it's not actively used. Internally it issues the `systemctl suspend` command to suspend the system. I'm not aware of having changed any suspend service beside installing this additional service. Could you elaborate on what I could change that is not considered default?
Raffa avatar
jp flag
`/opt/autosuspend/bin/autosuspend` appears to be a Python script in which `from autosuspend import main` is trying to import a module which the system Python reports doesn't exist(*not installed*) `ModuleNotFoundError: No module named 'autosuspend'` this could happen for example after upgrading Ubuntu release that comes with a newer system Python version.
Score:2
jp flag

How to investigate?

You need to see what's going on behind the scenes in your system while it's suspending/sleeping ... You can start by inspecting related system messages using journalctl like so:

journalctl --grep='suspend|sleep' --no-pager --since="-1week"

You can change --since="-1week" which will show messages from the past seven days to show only the past day --since="-1day" or the past hour --since="-1hour" ... etc. and the --grep='suspend|sleep'(case insensitive if pattern(s) is(are) all lower-case) will only show messages that have suspend or sleep in them while the --no-pager will disable the pager behavior and print the output at once allowing you to easily copy the whole output in a single action.

How to expand your investigation?

journalctl search/output can be expanded by adding more related search words to the --grep= option separated by the |(or) regex operator like e.g.:

journalctl --grep='suspend|sleep|acpi' --no-pager --since="-1hour"

also priority of the messages can be specified like for example to print messages of priority 4(warning) and more critical "emerg" (0), "alert" (1), "crit" (2), "err" (3), "warning" (4)", you can use it like so:

journalctl --priority=4 --no-pager --since="-1hour"

and so on ...

What do your logs reveal?

Issue #1

In your logs, /opt/autosuspend/bin/autosuspend appears to be a Python script in which from autosuspend import main is trying to import a module which the system Python reports doesn't exist(not installed) ModuleNotFoundError: No module named 'autosuspend' ... The relevant lines are:

Jul 16 23:30:03 homse1 autosuspend[1292]:   File "/opt/autosuspend/bin/autosuspend", line 5, in <module>
Jul 16 23:30:03 homse1 autosuspend[1292]:     from autosuspend import main
Jul 16 23:30:03 homse1 autosuspend[1292]: ModuleNotFoundError: No module named 'autosuspend'

This could happen for example after upgrading Ubuntu release that comes with a newer system Python version ... Please see for example python3.8 and pip after upgrading to 22.04.2

Therefore, as a fix, you might want to first try:

python3 -m pip install -U autosuspend

Or with sudo if globally installing(depends on how your script is run).

Issue #2

In your logs, two PCI controllers/sockets appear to not fully complying with system suspend/sleep ... The relevant lines are:

Jul 17 22:34:48 homse1 kernel: pci 0000:03:00.0: async suspend disabled to avoid multi-function power-on ordering issue
Jul 17 22:34:48 homse1 kernel: pci 0000:03:00.1: async suspend disabled to avoid multi-function power-on ordering issue
Jul 17 22:34:48 homse1 kernel: pci 0000:05:00.0: async suspend disabled to avoid multi-function power-on ordering issue
Jul 17 22:34:48 homse1 kernel: pci 0000:05:00.1: async suspend disabled to avoid multi-function power-on ordering issue

Therefore, you need to further investigate what these are and what kernel modules/drivers are in use for them by running the command lspci -nnk -s on them one by one or all at once like so:

for s in "0000:03:00" "0000:05:00"; do lspci -nnk -s "$s"; done

and your output:

03:00.0 SATA controller [0106]: JMicron Technology Corp. JMB363 SATA/IDE Controller [197b:2363] (rev 03)
    Subsystem: Gigabyte Technology Co., Ltd Motherboard [1458:b000]
    Kernel driver in use: ahci
    Kernel modules: ahci
03:00.1 IDE interface [0101]: JMicron Technology Corp. JMB363 SATA/IDE Controller [197b:2363] (rev 03)
    Subsystem: Gigabyte Technology Co., Ltd Motherboard [1458:b000]
    Kernel driver in use: pata_jmicron
    Kernel modules: pata_jmicron, pata_acpi
05:00.0 SATA controller [0106]: JMicron Technology Corp. JMB363 SATA/IDE Controller [197b:2363] (rev 02)
    Subsystem: Gigabyte Technology Co., Ltd Motherboard [1458:b000]
    Kernel driver in use: ahci
    Kernel modules: ahci
05:00.1 IDE interface [0101]: JMicron Technology Corp. JMB363 SATA/IDE Controller [197b:2363] (rev 02)
    Subsystem: Gigabyte Technology Co., Ltd Motherboard [1458:b000]
    Kernel driver in use: pata_jmicron
    Kernel modules: pata_jmicron, pata_acpi

reveals that those are JMicron Technology Corp. JMB363 SATA/IDE Controllers which have been reported with power management issues under some kernels for example here and here and the issue has been also isolated to the pata_acpi(in use on your system) kernel module for example here ... Therefore, that might be related to preventing your system from sleeping and you might want to read linked resources and others on this matter and then, troubleshoot and see what might work for you by experimenting with e.g. blacklisting the pata_acpi kernel module and see if that helps.

Fgop avatar
je flag
Thanks for the suggestion. I deactivated the autosuspend service and subsequently removed the systemd unit files of that service. The system seems to properly suspend now. I will test a bit more tomorrow, but the `journalctl` command was a good hint. Maybe add that to your answer.
Fgop avatar
je flag
Actually just now after I typed my response it happened again. I updated my initial post with more logs that show the last two successful and one unsuccesful suspend commands.
Raffa avatar
jp flag
@Fgop Please run `for s in "0000:03:00" "0000:05:00"; do lspci -nnk -s "$s"; done` and add toe output to your question.
Fgop avatar
je flag
I added the requested output to the question as Edit 3.
Raffa avatar
jp flag
@Fgop I see that the module `pata_acpi` is in use for those two controllers and I suspect that it might be related to preventing your system from sleeping … Those two controllers addresses appear in your logs as not fully complying with system sleep … See if blacklisting that module and rebooting helps … See probably related https://wiki.manjaro.org/index.php/Kernel_Fails_to_Load_(pata_acpi_error)
Fgop avatar
je flag
I will give that a shot and will report back here, if it helped. Thanks for the suggestions so far!
Fgop avatar
je flag
I blacklisted both the `pata_acpi` and the `pata_jmicron` modules as I can confirm through `lsmod`, but the error message in `journalctl` keeps showing up and the suspend issue persists.
Fgop avatar
je flag
I ended up deactivating the ide controller in the bios. Now the error messages you were referring to are gone, unfortunately even now the suspend issue persists.
Raffa avatar
jp flag
@Fgop Then, you probably want to expand the investigation.
Fgop avatar
je flag
Yes, I agree, however I'm really not sure how to proceed, hence my initial question. Do you have any further tips where I might continue my investigation?
Raffa avatar
jp flag
@Fgop Basically searching the logs for errors, warnings … etc. and see the time stamp to tie them with the suspend issue until you find something that might be related … I have updated the answer for that under "How to expand your investigation?" so kindly see that.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.