Score:0

Ubuntu Server hard lockup detected

cu flag

I've been having an issue with Ubuntu Server 22.04 where I get a lockup seemingly at random times but on average about every 12 hours.

I'm running netconsole to get logs and when the issue occurs I can ping the server, but not much else. SSH doesn't work and neither does netconsole or anything else.

My kernel version is 5.15.0-53. My hardware is:

  • Gigabyte A320M mobo
  • AMD Ryzen 5 1600
  • Geforce GT 710 for accessing bios and whatnot. Not running a desktop environment.
  • 4TB HDD
  • 8GB RAM

I've tested both the RAM and the HDD and they both came back fine. I've replaced the PSU and that didn't do anything either.

The CPU is my old cpu from after I upgraded my main PC. It ran linux perfectly and never gave me troubles so if it's a hardware issue I'm thinking it has to be the motherboard. I'm considering removing the GPU to test if that fixes it, but the issues didn't begin when I added the GPU so I doubt it's the problem.

Whenever I get a lockup (hard or soft) the RIP is always:

RIP:0010:smp_call_function_many_cond+0x13a/0x360

Logs from netconsole:

Nov 23 23:01:01 192.168.0.100 [26450.434430] NMI watchdog: Watchdog detected hard LOCKUP on cpu 5
Nov 23 23:01:01 192.168.0.100 [26450.434434] Modules linked in: iptable_filter bpfilter xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec edac_mce_amd snd_hda_core snd_hwdep snd_pcm nvidiafb kvm vgastate fb_ddc cdc_acm snd_timer rapl snd i2c_algo_bit soundcore ccp wmi_bmof gigabyte_wmi k10temp mac_hid nvidia_uvm(POE) sch_fq_codel netconsole hwmon_vid msr parport_pc ppdev dm_multipath lp pstore_blk ramoops parport scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore pstore_zone reed_solomon ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc
Nov 23 23:01:01 192.168.0.100 32c
Nov 23 23:01:01 192.168.0.100 [26450.434525]  raid1 raid0 multipath linear nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper crct10dif_pclmul syscopyarea crc32_pclmul sysfillrect sysimgblt ghash_clmulni_intel fb_sys_fops aesni_intel cec crypto_simd r8169 cryptd rc_core ahci gpio_amdpt xhci_pci drm i2c_piix4 realtek libahci xhci_pci_renesas wmi gpio_generic
Nov 23 23:01:01 192.168.0.100 [26450.434557] CPU: 5 PID: 164 Comm: kworker/5:1 Tainted: P           OE     5.15.0-53-generic #59-Ubuntu
Nov 23 23:01:01 192.168.0.100 [26450.434563] Hardware name: Gigabyte Technology Co., Ltd. A320M-S2H/A320M-S2H-CF, BIOS F5a 07/29/2022
Nov 23 23:01:01 192.168.0.100 [26450.434566] Workqueue: events free_work
Nov 23 23:01:01 192.168.0.100 [26450.434576] RIP: 0010:smp_call_function_many_cond+0x13a/0x360
Nov 23 23:01:01 192.168.0.100 [26450.434585] Code: b0 0a 02 41 89 c4 73 2e 4d 63 ec 48 8b 0b 49 81 fd ff 1f 00 00 0f 87 e4 01 00 00 4a 03 0c ed e0 ca ae ae 8b 41 08 a8 01 74 0a <f3> 90 8b 51 08 83 e2 01 75 f6 eb bb 48 83 c4 40 5b 41 5c 41 5d 41
Nov 23 23:01:01 192.168.0.100 [26450.434588] RSP: 0018:ffffa87f007d7cb0 EFLAGS: 00000202
Nov 23 23:01:01 192.168.0.100 [26450.434592] RAX: 0000000000000011 RBX: ffff91ef76971bc0 RCX: ffff91ef76837a40
Nov 23 23:01:01 192.168.0.100 [26450.434595] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff91ee4006c840
Nov 23 23:01:01 192.168.0.100 [26450.434598] RBP: ffffa87f007d7d18 R08: 0000000000000000 R09: 0000000000000000
Nov 23 23:01:01 192.168.0.100 [26450.434600] R10: 0000000000000000 R11: ffffffffffffffff R12: 0000000000000000
Nov 23 23:01:01 192.168.0.100 [26450.434602] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000020
Nov 23 23:01:01 192.168.0.100 [26450.434604] FS:  0000000000000000(0000) GS:ffff91ef76940000(0000) knlGS:0000000000000000
Nov 23 23:01:01 192.168.0.100 [26450.434607] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 23 23:01:01 192.168.0.100 [26450.434610] CR2: 00007f70eaca801c CR3: 0000000026210000 CR4: 00000000003506e0
Nov 23 23:01:01 192.168.0.100 [26450.434613] Call Trace:
Nov 23 23:01:01 192.168.0.100 [26450.434615]  <TASK>
Nov 23 23:01:01 192.168.0.100 [26450.434620]  ? invalidate_user_asid+0x30/0x30
Nov 23 23:01:01 192.168.0.100 [26450.434631]  on_each_cpu_cond_mask+0x1d/0x30
Nov 23 23:01:01 192.168.0.100 [26450.434635]  flush_tlb_kernel_range+0x41/0xa0
Nov 23 23:01:01 192.168.0.100 [26450.434641]  __purge_vmap_area_lazy+0xbd/0x6f0
Nov 23 23:01:01 192.168.0.100 [26450.434646]  ? __update_idle_core+0x93/0x120
Nov 23 23:01:01 192.168.0.100 [26450.434652]  ? __cond_resched+0x1a/0x50
Nov 23 23:01:01 192.168.0.100 [26450.434659]  free_vmap_area_noflush+0x2c7/0x310
Nov 23 23:01:01 192.168.0.100 [26450.434665]  remove_vm_area+0xa5/0xc0
Nov 23 23:01:01 192.168.0.100 [26450.434670]  __vunmap+0x93/0x260
Nov 23 23:01:01 192.168.0.100 [26450.434675]  free_work+0x25/0x40
Nov 23 23:01:01 192.168.0.100 [26450.434680]  process_one_work+0x22b/0x3d0
Nov 23 23:01:01 192.168.0.100 [26450.434685]  worker_thread+0x53/0x420
Nov 23 23:01:01 192.168.0.100 [26450.434688]  ? process_one_work+0x3d0/0x3d0
Nov 23 23:01:01 192.168.0.100 [26450.434692]  kthread+0x12a/0x150
Nov 23 23:01:01 192.168.0.100 [26450.434696]  ? set_kthread_struct+0x50/0x50
Nov 23 23:01:01 192.168.0.100 [26450.434701]  ret_from_fork+0x22/0x30
Nov 23 23:01:01 192.168.0.100 [26450.434710]  </TASK>
Nov 23 23:01:01 192.168.0.100 [26450.434715] perf: interrupt took too long (2634 > 2500), lowering kernel.perf_event_max_sample_rate to 75750

If you need more logs or information please ask. I've been dealing with this issue for quite a while.

Score:0
cu flag

I finally solved my problem! What seems to have happened was at some point virtualization got turned off in my BIOS and that was causing the issue. I reenabled virtualization in the BIOS settings and I've been running smooth for a week.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.