Score:-1

How would you debug very early boot process

co flag

I have problem where I do not know how find the cause: After selecting Linux to boot from the grub menu my system seems to "shut down" seemingly instantly sometime during the boot process after some messages are logged as if it lost power or halted immediately.

This seems to happen so early when booting that the file system is not yet mounted and no logs are written anywhere so I could diagnose the problem. It mostly happens before the vga resolution is switched to a higher resolution. I can read some of the verbose messages during the boot process, but there is no unusual error message and the shutdown mostly happens around the time before the resolution is switched, but it is not 100% consistent.

Strangely, sometimes the system can boot successfully. Everything seems to random. I am fairly certain that it should not be a hardware issue such as the power supply. The reason is that the system never shuts off when I boot Windows and I already tried swapping components such as power supply, mainboard and RAM.

Does anyone know how one could diagnose the problem and find the cause? Or even better, what the problem could be?

oldfred avatar
cn flag
What video card/chip? What version of Ubuntu? If you want lots of details on hardware you can run this new script. Do not copy data from the screen as it may include data you do not want to share. The upload to pastebin site removes that type of data. Spacebar thru the screens & q to exit screens. `wget -N -t 5 -T 10 https://github.com/UbuntuForums/system-info/raw/main/system-info && \ chmod +x system-info && \ ./system-info`
Joseph Dalton avatar
co flag
I used Ubuntu 20.04 and the graphics are Intel Integrated Graphics P630. https://paste.ubuntu.com/p/VvXqp8SYm9/
oldfred avatar
cn flag
Do you have latest UEFI from Gigabyte for your motherboard? This shows several newer: https://www.gigabyte.com/Motherboard/C246N-WU2-rev-10/support#support-dl-bios I do not know zfs and what differences it may have.
Joseph Dalton avatar
co flag
The install probably does not make a difference because the same problem can occur with a live usb stick boot without a full installation. So ZFS probably does not make a difference. You are right that there is a very new BIOS from 2021/11/29 that I have not yet tried.
Joseph Dalton avatar
co flag
New BIOS did not change anything.
oldfred avatar
cn flag
Is there any notice in the log files? Review log files `sudo egrep -i 'warn|error' /var/log/*g` I get several pages of warnings most related to ACPI which do not matter. And some new bootstrap error which I did not have before. May have to check into that, myself. But my system does work ok. Google found a lot of old entries on bootstrap error, which may be related to install & does not matter.
Joseph Dalton avatar
co flag
I have removed quiet/spash and tried to film the boot process. I did not see any suspicious messages or warnings. It all happens very fast and it shuts down in various places in the process, not always at the same point. I think this part is not yet in the logs because the filesystem is only mounted later, so it cannot be written to disk yet. This is why I asked if I can do something before this. I have installed different Linux distros on a lot of different PCs before, but on this one, it always shuts off and I have no clue why. Windows runs flawlessly. So does memtest.
oldfred avatar
cn flag
From live installer you can run fsck on the ext4 partitions. But if zfs, I do not know what the equivalent file repair tool is. https://www.phoronix.com/scan.php?page=news_item&px=Linus-Says-No-To-ZFS-Linux I might try using ext4 for / & /home and then if you want zfs create a data partition.
Joseph Dalton avatar
co flag
If you say "my system does work ok", do you mean that you have similar Gigabyte system? If I boot from a live usb, there are no logs. When I looked if I managed to do a successful boot, I did not find anything meaningful yet, but I can try again with your suggestion. My main problem is that the spotaneous shutdowns are very early so I cannot get logs. I also suspect that there may not be logs if the system shuts off before anything is logged.
oldfred avatar
cn flag
If you get grub menu it has started to boot. Have you removed quiet splash to see boot process, that shows what will be in log file. Sometimes you can then see issue, but new SSD it scrolls by so quickly that I cannot see much. I always remove quiet splash, just to see boot process. I am at my Asus Z97 system in Illinois, but hope to be at Gigabyte Z170 system in Florida shortly. Retired & snowbird or one who heads south in Winter. Both systems using Kubuntu 20.04, but have installed just about every 6 month release to test partitions.
oldfred avatar
cn flag
I just say in forums where one of the users said there was a kernel issue with one version of the kernel. And they had to mount zfs to make repairs. Do not know details. https://ubuntuforums.org/showthread.php?t=2469521
Joseph Dalton avatar
co flag
The live installer file system is OK. I also tried multiple devices. I also tried 18.04. I don't think zfs has to do anything with the problems; the shutdown often occurs before mounting the filesystem and also on the live installer. I am trying to figure out if there is something wrong with my hardware, but it works with Windows without problems. I also have many Linux installs with no problems, but this PC just randomly shuts down. And I have no idea how I can trace the root cause without additional diagnostic information. The problem does not appear as simple as wrong kernel/BIOS etc.
Score:0
jp flag

From the description of the problem I'm not sure how this will help. However, these are my general tips for debugging the very early boot process.

  • remove kernel arguments like quiet and splash. From the comments, it sounds like you are already doing this.
  • add kernel argument debug=. This will configure casper to write a debug log to /casper.log inside the initrd temporary filesystem, which will be copied to /var/log/casper.log in the real file system.
  • add kernel argument debug. This is different than the debug= argument. This will configure initramfs to write a debug log to /run/initramfs/initramfs.debug.
  • add break= kernel argument to pause the boot process in the initramfs and open a debug shell. E.g. break=top will pause the boot process at the beginning.

Links

Joseph Dalton avatar
co flag
This seems to be extremely useful information! I will look into it.
Joseph Dalton avatar
co flag
After investigating, I now disabled several CPU features such as EIST and RTH and I finally have not yet had problems again. Thanks, this question directly addressed what I wanted to know. I am a bit sad that my question was downvoted, as I had done a lot of research and hoped for an excellent answer such as this.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.