Score:0

What is causing suspend to fail intermittently?

td flag

Suspending and hibernating my PC with Ubuntu 22.10 hangs sometimes. The display and input devices turn off but the PC remains on, requiring a hard shutdown. When looking at logs, I see no error after the system enters the sleep state.

I tried adding "no_console_suspend initcall_debug" to the boot parameter for more information but still there are no errors reported after the system enters the sleep state.

*Please note that while I am currently using the liqourix kernel, this issue was also happening with the stock 22.10 kernel and also on 22.04. I did not have this issue on 20.04 with same hardware. The suspend issue began happening after I installed a nvme drive with a fresh installation of 22.04 which I eventually upgraded to 22.10.

Extract from dmesg on a suspend that hanged:

Nov 17 15:43:21.726542 MBLPC kernel: ------------[ cut here ]------------
Nov 17 15:43:21.726690 MBLPC kernel: WARNING: CPU: 12 PID: 6060 at kernel/sched/alt_core.c:1539 migrate_enable+0xa9/0xb0
Nov 17 15:43:21.726704 MBLPC kernel: Modules linked in: rfcomm snd_hrtimer xt_MASQUERADE xt_CHECKSUM nft_chain_nat nf_nat vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs nvme_fabrics af_packet bridge stp llc cmac algif_hash algif_skcipher af_alg bnep ip6t_REJECT nf_reject_ipv6 xt_hl ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_comment xt_multiport nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack sunrpc nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink nls_utf8 nls_cp437 vfat xfs fat amdgpu snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio iwlmvm snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi intel_rapl_msr intel_rapl_common snd_hda_codec edac_mce_amd snd_oxygen radeon kvm_amd snd_hda_core snd_oxygen_lib mac80211 snd_mpu401_uart snd_hwdep libarc4 kvm gpu_sched snd_pcm drm_buddy drm_ttm_helper ttm snd_seq_dummy iwlwifi btusb snd_seq_oss drm_display_helper crct10dif_pclmul btrtl
Nov 17 15:43:21.726753 MBLPC kernel:  polyval_clmulni polyval_generic ghash_clmulni_intel btbcm snd_seq_midi snd_seq_midi_event aesni_intel btintel crypto_simd btmtk snd_rawmidi cryptd cec mousedev joydev mxm_wmi xpad wmi_bmof snd_seq cfg80211 bluetooth ff_memless rc_core snd_seq_device k10temp snd_timer drm_kms_helper ccp rng_core snd syscopyarea sysfillrect sysimgblt ecdh_generic fb_sys_fops soundcore agpgart rfkill acpi_cpufreq sg squashfs loop vfio_pci vfio_pci_core vfio_virqfd irqbypass vfio_iommu_type1 vfio msr parport_pc drm ppdev lp parport fuse ramoops reed_solomon efi_pstore ip_tables x_tables ext4 crc16 mbcache jbd2 uas usb_storage btrfs blake2b_generic xor raid6_pq usbhid dm_cache_smq dm_cache dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32c_generic dm_mod crc32_pclmul crc32c_intel i2c_piix4 igb i2c_algo_bit dca xhci_pci xhci_pci_renesas gpio_amdpt wmi gpio_generic
Nov 17 15:43:21.726780 MBLPC kernel: CPU: 12 PID: 6060 Comm: firefox:cs0 Tainted: G           O       6.0.0-9.1-liquorix-amd64 #1  liquorix 6.0-5ubuntu1~kinetic
Nov 17 15:43:21.726797 MBLPC kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Taichi, BIOS P7.00 01/15/2022
Nov 17 15:43:21.726808 MBLPC kernel: RIP: 0010:migrate_enable+0xa9/0xb0
Nov 17 15:43:21.726819 MBLPC kernel: Code: e9 cc 71 d2 00 83 ea 01 66 89 90 00 01 00 00 31 c0 31 d2 31 c9 e9 b7 71 d2 00 e8 ec 8e f2 ff 31 c0 31 d2 31 c9 e9 a7 71 d2 00 <0f> 0b eb 94 0f 1f 00 0f 1f 44 00 00 8b 05 15 06 98 01 83 f8 ff 74
Nov 17 15:43:21.726830 MBLPC kernel: RSP: 0018:ffffc90016d2be00 EFLAGS: 00010282
Nov 17 15:43:21.726841 MBLPC kernel: RAX: ffff88824b48b500 RBX: 000000007fff0000 RCX: ffff88824b48b5e8
Nov 17 15:43:21.726851 MBLPC kernel: RDX: 000000000000000c RSI: 00000000c000003e RDI: ffffc90016d2be90
Nov 17 15:43:21.726863 MBLPC kernel: RBP: ffff8881d7571b00 R08: 00000000c0186444 R09: 000000000000004b
Nov 17 15:43:21.726874 MBLPC kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90016d2be90
Nov 17 15:43:21.726883 MBLPC kernel: R13: 000000007fff0000 R14: 0000000000000000 R15: ffffc9000ff81000
Nov 17 15:43:21.726895 MBLPC kernel: FS:  00007ff42aba1700(0000) GS:ffff888ffeb00000(0000) knlGS:0000000000000000
Nov 17 15:43:21.726907 MBLPC kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 17 15:43:21.726918 MBLPC kernel: CR2: 00007ff43b928000 CR3: 000000024ad04000 CR4: 0000000000350ee0
Nov 17 15:43:21.726927 MBLPC kernel: Call Trace:
Nov 17 15:43:21.726938 MBLPC kernel:  <TASK>
Nov 17 15:43:21.726948 MBLPC kernel:  __seccomp_filter+0xde/0x870
Nov 17 15:43:21.726957 MBLPC kernel:  ? futex_wake+0x7c/0x180
Nov 17 15:43:21.726970 MBLPC kernel:  syscall_trace_enter.constprop.0+0xa3/0x1b0
Nov 17 15:43:21.726982 MBLPC kernel:  do_syscall_64+0x15/0xc0
Nov 17 15:43:21.726992 MBLPC kernel:  entry_SYSCALL_64_after_hwframe+0x63/0xcd
Nov 17 15:43:21.727003 MBLPC kernel: RIP: 0033:0x7ff44e2c23ab
Nov 17 15:43:21.727012 MBLPC kernel: Code: 0f 1e fa 48 8b 05 e5 7a 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 7a 0d 00 f7 d8 64 89 01 48
Nov 17 15:43:21.727022 MBLPC kernel: RSP: 002b:00007ff42aba09e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Nov 17 15:43:21.727033 MBLPC kernel: RAX: ffffffffffffffda RBX: 00007ff42aba0a50 RCX: 00007ff44e2c23ab
Nov 17 15:43:21.727051 MBLPC kernel: RDX: 00007ff42aba0a50 RSI: 00000000c0186444 RDI: 000000000000004b
Nov 17 15:43:21.727060 MBLPC kernel: RBP: 00000000c0186444 R08: 00007ff42aba0bb0 R09: 0000000000000020
Nov 17 15:43:21.727071 MBLPC kernel: R10: 00007ff42aba0bb0 R11: 0000000000000246 R12: 00007ff4398dcb00
Nov 17 15:43:21.727081 MBLPC kernel: R13: 000000000000004b R14: 0000000000000000 R15: 00007ff35aa45090
Nov 17 15:43:21.727090 MBLPC kernel:  </TASK>
Nov 17 15:43:21.727102 MBLPC kernel: ---[ end trace 0000000000000000 ]---
Nov 17 16:32:22.215527 MBLPC kernel: usb 1-6.1.2: USB disconnect, device number 19
Nov 17 19:38:14.312541 MBLPC kernel: usb 1-2.3: USB disconnect, device number 20
Nov 17 19:38:14.517527 MBLPC kernel: usb 1-2.3: new high-speed USB device number 21 using xhci_hcd
Nov 17 19:38:14.634528 MBLPC kernel: usb 1-2.3: New USB device found, idVendor=3842, idProduct=2608, bcdDevice=a1.18
Nov 17 19:38:14.634754 MBLPC kernel: usb 1-2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Nov 17 19:38:14.634831 MBLPC kernel: usb 1-2.3: Product: EVGA Z15 RGB Gaming Keyboard
Nov 17 19:38:14.634910 MBLPC kernel: usb 1-2.3: Manufacturer: EVGA Corporation
Nov 17 19:38:14.650524 MBLPC kernel: input: EVGA Corporation EVGA Z15 RGB Gaming Keyboard as /devices/pci0000:00/0000:00:01.3/0000:02:00.0/usb1/1-2/1-2.3/1-2.3:1.0/0003:3842:2608.000D/input/input34
Nov 17 19:38:14.702526 MBLPC kernel: hid-generic 0003:3842:2608.000D: input,hidraw3: USB HID v1.11 Keyboard [EVGA Corporation EVGA Z15 RGB Gaming Keyboard] on usb-0000:02:00.0-2.3/input0
Nov 17 19:38:14.710524 MBLPC kernel: input: EVGA Corporation EVGA Z15 RGB Gaming Keyboard Mouse as /devices/pci0000:00/0000:00:01.3/0000:02:00.0/usb1/1-2/1-2.3/1-2.3:1.1/0003:3842:2608.000E/input/input35
Nov 17 19:38:14.710552 MBLPC kernel: input: EVGA Corporation EVGA Z15 RGB Gaming Keyboard Consumer Control as /devices/pci0000:00/0000:00:01.3/0000:02:00.0/usb1/1-2/1-2.3/1-2.3:1.1/0003:3842:2608.000E/input/input36
Nov 17 19:38:14.762526 MBLPC kernel: input: EVGA Corporation EVGA Z15 RGB Gaming Keyboard System Control as /devices/pci0000:00/0000:00:01.3/0000:02:00.0/usb1/1-2/1-2.3/1-2.3:1.1/0003:3842:2608.000E/input/input37
Nov 17 19:38:14.762575 MBLPC kernel: hid-generic 0003:3842:2608.000E: input,hiddev97,hidraw4: USB HID v1.11 Mouse [EVGA Corporation EVGA Z15 RGB Gaming Keyboard] on usb-0000:02:00.0-2.3/input1
Nov 17 19:38:14.768524 MBLPC kernel: input: EVGA Corporation EVGA Z15 RGB Gaming Keyboard as /devices/pci0000:00/0000:00:01.3/0000:02:00.0/usb1/1-2/1-2.3/1-2.3:1.2/0003:3842:2608.000F/input/input39
Nov 17 19:38:14.820529 MBLPC kernel: hid-generic 0003:3842:2608.000F: input,hiddev98,hidraw5: USB HID v1.11 Keyboard [EVGA Corporation EVGA Z15 RGB Gaming Keyboard] on usb-0000:02:00.0-2.3/input2
Nov 17 21:45:37.987528 MBLPC kernel: PM: suspend entry (deep)

Update

I came across a post with a similar issue where the cause was the nvme drive. Looking back, the only difference hardware wise since the issue came up was the installation of a nvme drive.

I noticed that the "ignore_loglevel" would not only give output to logs but also display to screen when performing suspend and hibernate which is most useful with this particular issue. I decided to monitor for errors this way, keeping an eye on any related to nvme upon a suspend or hibernate failure.

The thing is, after setting "ignore_loglevel" the PC never failed a suspend or hibernate. I will keep monitoring it but based on past experience from the amount of times I have performed suspend and hibernate cycles it would have failed by now.

Another thing I noticed, is that after resuming from suspend, I now get an authentication pop up asking to update SMART data from a particular drive. So far it has been for a different drive each time it came up.

Score:0
td flag

So two things seem to be working as a "resolution". By having ignore_loglevel as a boot parameter the suspend issue never resurfaced. I don't understand why but I don't mind this as it displays output to the screen when performing a hibernate. For times where a hibernate or resume takes longer, instead of looking at a blank screen I can now tell which step the process has reached.

As mentioned in the last update, the only hardware difference since the suspend issue was the installation of a nvme drive. After coming across a forum post with similar issue, I tried disabling the wakeup status of the nvme device via /proc/acpi/wakeup and performed multiple suspend and hibernate tests. All tests were successful. I want to mention that I did remove ignore_loglevel parameter before testing this. I understand a script is necessary to make this change permanent (https://unix.stackexchange.com/questions/417956/make-changes-to-proc-acpi-wakeup-permanent).

Score:0
ao flag

Your message seems to already give you an hint of your problem.

WARNING: CPU: 12 PID: 6060 at kernel/sched/alt_core.c:1539 migrate_enable+0xa9/0xb0

Try solving the issue as shown here: "BEST PRACTICE TO DEBUG LINUX* SUSPEND/HIBERNATE ISSUES". Summary:

  1. initcall_debug
  2. no_console_suspend
  3. ignore_loglevel

...

  1. pm_test
  2. ACPI wakeup
  3. acpidump
  4. rtcwake
  5. analyze_suspend

In particular I suggest section

4 DEBUGGING SUSPEND/HIBERNATE ISSUES

to identify and debug your specific issue.

Cascadoo avatar
td flag
Thank for your response. I actually referenced that guide initially for the first two booth parameters I used, hoping that I would have gotten some output in the logs past the suspend entry. My next step was to try serial but unfortunately the MB does not have a serial breakout connector. Will reference the guide and try some of the other parameters for additional information. Also searching for some information on that warning at the beginning.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.