Score:0

How to debug Linux server reboots?

ng flag

I have a Debian 10 server that keeps rebooting. journalctl offers possibility to list last boots:

journalctl --list-boots
-6 1ee519dc5bc24e88af75cc609ee32093 Mon 2023-02-06 21:02:02 UTC—Sun 2023-02-12 17:23:28 UTC
-5 bb25fc752ac1428abb87bab15a3cea8b Sun 2023-02-12 17:26:04 UTC—Sun 2023-02-12 17:34:59 UTC
-4 91245b74acdc4c7086ebc4a626d55dcc Sun 2023-02-12 17:37:39 UTC—Sun 2023-02-12 21:48:10 UTC
-3 e3978f5222164454be6ebcd12a1ea65b Sun 2023-02-12 21:50:48 UTC—Sun 2023-02-12 22:38:56 UTC
-2 b3bc3015a73a4661af9f2c277e9bc03d Sun 2023-02-12 22:42:02 UTC—Mon 2023-02-13 02:02:07 UTC
-1 57f4a16489904888acc285ed090afaa7 Mon 2023-02-13 02:04:40 UTC—Mon 2023-02-13 04:04:46 UTC
 0 28efdbf5275f4320ad11f3075b66aa95 Mon 2023-02-13 04:07:21 UTC—Mon 2023-02-13 08:33:09 UTC

However it's not clear where the system was rebooted by user, kernel crashed or the power was cut. Is there any tool that would provide such output?

Zareh Kasparian avatar
us flag
what is the infra you are running your OS on ? virtual ? dedicated server?
ng flag
It's dedicated server, Debian `4.19.0-23-amd64`
A.B avatar
cl flag
A.B
If the server crashes, last logs could be lost anyway. Sending them through UDP (using kernel module netconsole) to be stored on a remote system might keep more logs.
Score:1
ng flag

I wrote a simple tool in bash to collect automatically additional information about reboots. The script uses internally journalctl, so it might work on any Linux distribution using Systemd.

The idea is simple, for each session we want to check the logs for additional information, check for known entries:

  • system received SIGTERM
  • asked to shutdown
  • SEGFAULT
  • kernel BUG

Confirming a crash is complicated. That's why the some lines are marked as CRASH?. Which means that such log suddenly ends without recognized error message. In some cases a SEGFAULT might get logged, sometimes not.

This might help the operator to focus on boot sessions with suspicious entries.

$ crashctl
Distribution        : Debian GNU/Linux 10 (buster)
Kernel              : 4.19.0-23-amd64 #1 SMP Debian 4.19.269-1 (2022-12-20)
Current boot        : 606aaecb-b14d-4bbc-9598-b6c60233a888
Scaled load         : 0.04 0.01 0.00 
System installed    : Tue Jan  3 09:26:13 UTC 2023
System started      : Mon Feb  6 03:11:44 CET 2023
Uptime              : up 7 days
Running processes   : 384
kdump               : current state   : ready to kdump
Boot First message             Last message             Uptime       Reboot/Crash
---------------------------------------------
-11  2022-12-05 20:43:53 UTC   2022-12-05 20:52:00 UTC  0d 00:08:07  reboot (SIGTERM)
-10  2022-12-06 07:56:01 UTC   2022-12-06 15:14:36 UTC  0d 07:18:35  CRASH?
-9   2022-12-07 12:28:07 UTC   2022-12-10 16:33:43 UTC  3d 04:05:36  reboot (SIGTERM)
-8   2022-12-12 08:56:05 UTC   2022-12-18 08:18:40 UTC  5d 23:22:35  CRASH?
-7   2022-12-18 08:32:27 UTC   2022-12-25 10:54:03 UTC  7d 02:21:36  reboot (SIGTERM)
-6   2022-12-28 10:51:54 UTC   2022-12-29 12:12:32 UTC  1d 01:20:38  Power key pressed, but ignored
-5   2023-01-02 08:45:54 UTC   2023-01-06 08:05:01 UTC  3d 23:19:07  CRASH?
-4   2023-01-06 10:07:00 UTC   2023-01-12 10:01:25 UTC  5d 23:54:25  Power key pressed, but ignored
-3   2023-01-12 10:04:36 UTC   2023-01-28 14:07:19 UTC  16d 04:02:43 reboot (SIGTERM)
-2   2023-01-30 08:43:42 UTC   2023-01-31 07:27:26 UTC  0d 22:43:44  reboot (SIGTERM)
-1   2023-02-02 12:41:51 UTC   2023-02-04 13:16:19 UTC  2d 00:34:28  reboot (SIGTERM)
0    2023-02-06 03:12:01 UTC   2023-02-13 18:17:52 UTC  7d 15:05:51  running
Score:0
in flag

Please try below sequence.

  1. Check last or journalctl --list-boots command output and get the date and time for any reboot(Keep date time format same while searching).
  2. Open /var/log/messages file and search for same date time. If logs rotated check in old logs.
  3. Check what is happening before reboot.
  4. If you see stopping service statement it mean server is rebooted normally either by user or scheduled or via console(in case of cloud instance).
  5. In case of crash you will see crash trace.
ng flag
I don't find `last` very useful for me. It shows e.g. a from a month ago `reboot system boot 4.19.0-23-amd64 Tue Jan 3 10:26 still running` as "still running". Meanwhile the system crashed at least 5 times.
Zareh Kasparian avatar
us flag
are you sure there is nothing wrong with the physical server itself. check the logs through the ilo if running on the HP server
asktyagi avatar
in flag
You can use `journalctl --list-boots` output to get last reboot time, rest steps will be same, do you find something interesting in message logs?
ng flag
Yes, there's definitely something wrong with the server. I know about `journalctl --list-boots`, the output is in the question. Just the provided information is not sufficient.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.