Score:0

Ubuntu 21.04-21.10 Random shutdowns, no logs

tj flag

Hardware (hardinfo):

I hope this isn't a hardware issue...

    OS: Ubuntu 21.10
    CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1 physical processor; 4 cores; 8 threads
    RAM: 7858608 KiB (AKA 8GB)
    Motherboard: Lenovo YOGA 730-13IKB / LNVNB161216 (LENOVO)
    Graphics: 1920x1080 (Unknown) The X.Org foundation
    Storage: (Shows nothing for some reason, but I already opened my computer to clean it to fix this issue and i confirmed it was an NVMe SSD.
    Printers: (Irrelevant)
    Audio: USB-Audio - USB Device 0x46d

Symptoms:

What I despise.

Now, since upgrading from Ubuntu 20.04 LTS to Ubuntu 21.04, I've been experiencing some crashes, these crashes:

  • Do not automatically reboot
  • Are spontaneous
  • Happen ONLY when plugged in to AC power
  • Have no signs of logging anywhere
  • List item

Attempts:

Things that didn't work

I tried reinstalling the system at least two times (I have somehow forgotten, but it was two or more), updating from 21.04 to 21.10 in the process. Also worth noting that I chose what programs to back up, only choosing the ones that were:

  • Not automatically installed
  • Not local (I could reinstall those debs later myself)
  • All not automatically development libraries

The only notable difference from 21.04 and 21.10 in crashes is none (IIRC).

Other things I have tried:

  • BIOS Update
  • Reinstalling thermald
  • Disabling c-states (and enabling them again due to not helping)
  • Tried to log kernel (couldn't set it up correctly, manual crash gave no log)
  • Set up persistent journal (found nothing of use in there, I can post it if necessary though)

Extra

Some extra information that may help

The last piece of information I can provide is a text file, in which I wrote a bunch of things that I tried, suspected, and failed at. It is very unorganized (especially at the end when I just got irritated and started cursing at the end of file), but I'll include it nonetheless.

Personal logging:

When I updated to Ubuntu 21.04, thing's started going wrong.
I assume schedutil is doing something, as the computer crahes sometimes, no log or anything either.
I checked /var/log/kern.log among others, and I found nothing.

I suspect it's something to do with "P-states" and "C-states".
P-states, which stand for performance states, are used to optimize power consumption during code execution. They can be changed by the OS to change the CPU voltage (in short, change CPU frequency).
C-states on the other hand, are used to optimize/reduce power consumption during idle mode (when no code is being executed).
The typical C-states are:
    C0      - CPU is actively running code (P-states)
    C1      - CPU uses HLT instruction when idle, the clock is gated off to parts of the core, but it is relatively quick to wake up
    C1E     - This is actually just C1, except when C1E is enabled, the CPU lowers the CPU's speed & voltage when it is in C1
    C2 & up - The CPU will shut off various parts of the core for greater power savings, at the cost of no longer to wake up.
Source: "Controlling Processor C-State Usage in Linux, A Dell technical white paper describing the use of C-states with Linux operating systems"

Anyways, all of this is still happening now, even in 21.10, so this has to be a kernel issue.
Although setting "intel_idle.max_cstate=0" does not stop the crashes, so maybe it's a different problem.
I already used "memtest86" and my system is fine.
I'm going to restart my computer and see if there are any c-state settings in the BIOS/UEFI (are the settings still called BIOS?).

Yeah I checked I couldn't find anything.

The Dell C-state PDF (same in source above) has this section right below the C-states one (which is the first one www) called "Checking C-State Usage". It says:
    There are several ways to see how much idle time is being spent in the various C-states. 
    First check the kernel messages from boot (“dmesg |grep idle” or “grep idle /var/log/messages”, for instance) to see which idle driver is in use.

This is what I got:
    sudo dmesg |grep idle
    [sudo] password for ws: 
    [    0.028186] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
    [    0.076265] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635855245 ns
    [    0.100211] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x39a8208cdd2, max_idle_ns: 881590748921 ns
    [    0.104538] process: using mwait in idle threads
    [    0.128722] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
    [    0.132319] cpuidle: using governor ladder
    [    0.132322] cpuidle: using governor menu
    [    0.389960] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
    [    1.426615] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x396d4ffc055, max_idle_ns: 881590662783 ns
    [    4.981048] systemd-journald[323]: varlink-22: varlink: setting state idle-server
    [    4.981116] systemd-journald[323]: varlink-22: varlink: changing state idle-server → processing-method
    [    5.336734] systemd-journald[323]: varlink-22: varlink: changing state processed-method → idle-server
    [    5.339242] systemd-journald[323]: varlink-22: varlink: changing state idle-server → pending-disconnect
I don't seem to see anything here, but I remember in the BIOS/UEFI settings something said ACPI instead of RXS or whatever it's called.
I can't open the folder "/proc/acpi/processor/CPU0/power", as it doesn't exist (only reaches "acpi").

Some time later, I decided to reinstall Ubuntu, and so I did, things worked well for the first day (and second) day, and then it crashed.
After some fidling around i decided to run "watch sensors", and I discovered something; when playing osu!, my temperature spiked up to ~95ºC, reaching 99ºC!
Just want to mention that the PC's killswitch is triggered at 100ºC, I was 1 degree away from it, and most of the time, 2 away (~96-98ºC most of the time)!

Another idea, this may be a PSU problem, as I've never seen it crash unplugged..

"kernelUpdateCrash" was this file's old name, now it is "cleaningComputer", I opened up the computer and holy hell there was so much crud in the fans.
Hasn't crashes since I cleaned it! Not really elaborating because this file is so long and also i've opened another computer which will be in another story (I think I'll call it "firstUbuntuInstallation").


Update it crashed again.
It didn't while it was sideways, so I think it's a fan air conduction problem thing.
An askubuntu question had their computer shutting down due to heating, I'm not sure if it is heating in my case but a bios update helped.
Source: "https://askubuntu.com/questions/1232813/ubuntu-20-04-shutdown-after-overheating"

I did it, I had to boot into a Windows PE USB to run the program, but the program didn't work...
So instead of ticking the "Install" option, I chose the "Unpack" option, and it unpacked another executable with the same name except that all the letters were capital now!
Anyways I ran it and it was this weird sketchy setup that appeared to be using WinAPI to put text where it shouldn't be and it wouldn't run without AC power.
I proceeded to plug it in and re-run it, it had a weird and probably broken image of a mascot that was like a pencil?
I attached an 2 images I took with my phone, that's why this story is in a folder.
PC rebooted, it worked, and then the fans started whirring up as if the thing was gonna blow up, never seen it like that, probably a temporary overvoltage of the fans while the computer tried to reboot.
So yeah I changed the title of this again.

It crashed again...

The last time I modified this file was: 2021年10月26日 19時55分59秒.
Now it's:               2021年11月06日 23時10分36秒
I just reintsalled thermald, seems to work, I'm not sure, throttles well I guess.
The "setPerformanceMode.sh" and "setPowersaveMode.sh" scripts that I created (using cpufreq) no longer seem to change anything.
So lets just hope this works, even if thermald was installed by default...
PS: I have i7z on a terminal set to "Always on Top" so I can monitor Frequency, C-states, and the temperature of the CPU cores (4 physical, logical).

Bruh thermald is throttling down to 400MHz while playing.
I tab out of the program it goes back to 1GHz what?
Okay I made Osu! set the FPS cap to V-Sync (60fps) instead of double of that (120fps, which it was before) and it seems to be good even when the computer is not on it's side (it usually didn't crash when on it's side, as the fans were pointing out.

Okay so I was checking the journalctl logs and I got:
"thermald.service: Changed running -> stop-sigterm"
huh...
Oh wait this is at the end of the journal it's probably shutdown 笑.

Keywords checked with "journalctl -g ???":
    thermal
    shutting
    crash
    panic
    spark
It just crashed while searching in journalctl... Let's investigate with "journalctl -b -1". Huh, the last log is 4 minutes before the crash, okay...
Yeah that's it im asking for help in AskUbuntu, should've done that a long time ago!
Alright now I just have to copy this into the question.

Footnote

If there's any other information I could provide, please leave a comment, I'll check them

regularly and update the post accordingly. Again, this could be a hardware issue, but it happened when I updated my system, and currently due to Wayland and several other things I can't downgrade to 20.04 LTS and work on it.

Doug Smythies avatar
gn flag
See if [here](https://askubuntu.com/questions/1373633/how-to-troubleshoot-cpu-hw-crash-in-ubuntu-18-04) and/or [here](https://askubuntu.com/questions/1370731/cpu-package-badly-configured-on-my-msi-laptop-how-to-reconfigure) helps.
CattoByte avatar
tj flag
@DougSmythies Sorry for not contacting back, it's exam season after all... Anyways, I'll try those things and then do a lot of resource intensive tasks (which usually shut it down), if they work, you can post an answer with those and I'll mark it as the correct one.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.