See the Event Logs heading farther down.
I'm on Ubuntu Server 21.04 running kernel 5.11.0-1015-raspi on aarch64.
What are the most effective things to prepare to move forward on diagnosing this next time it happens?
Occasionally after heavy use I start getting strange issues such as these:
- some processes that should be doing nothing display 100% usage of a single core on
top
(this happened recently with bash scripts looping on inotifywait on dev event files)
- these and a handful of other processes do not terminate with
kill -9
(I would have assumed inotifywait was simply terminating immediately except for this)
- the system may keep services running but ttys may halt processing input or output, including the serial tty
swapoff /path/to/swap
may hang indefinitely even when no swap space is used anymore
systemctl shutdown
may hang indefinitely, or the system may partly shut down and then hang
- usb keyboard lights may stop responding
- login prompts may wait a very long time after a user is entered, and then hang after displaying only part of the password prompt
- keystrokes may be dropped
- sometimes repeated kernel messages on a tty indicating the same hung task
- When indefinitely nonresponsive, I don't see any kernel panic on an open
dmesg --follow
, journalctl --follow
, or tty
- The caps lock light specifically appears generally nonfunctional on this machine. The caps lock light also appears nonfunctional on my aarch64 olimex teres.
I have recently updated the system and hope these issues may decrease, but I'd like to know what more I can do that may help in diagnosing or handling them. I took the effort to plug a serial cable in and was very surprised that the serial terminal itself could hang indefinitely mid-output.
This usually happens associated with excessive swap allocation, in excess of available ram, but some of the issues, like the strange processes that won't kill -9
, imply more than just memory thrashing to me, and the issues don't go away when memory is freed, although I'm not experienced with the Linux kernel.
Ideally I'd like to eventually narrow down the issue to a bug in the kernel, a problem with my hardware, or a compromised system.
Event logs:
2021-08-09
After systemctl isolate graphical
and systemctl isolate multi-user
systemd-journal is using 99% cpu flooding the journal that org.gnome.Shell@x11 is pending stop. systemctl status
says there is no such service.
I attempted journalctl | pastebinit
. The interface stopped responding before I got the url, i'm afraid.
This doesn't appear to be a virtual memory issue this time, but here are the memory outputs I got before it froze:
free -h
: https://paste.ubuntu.com/p/3c5tSTgGc4 (this was taken while it was unswapping; it did finish unswapping)
sysctl vm.swappiness
: https://paste.ubuntu.com/p/cpvJw4Nd8f
At 10:29 UTC my tmux session froze. I switched to tty3 and tried to log in. The tty hung displaying the password.
At 10:32 UTC the fan spun up high for about 1 minute.
I have an offline system connected to the serial terminal with dmesg open. The last lines are in regard to rfkill, handcopied onto my mobile phone below:
[225366.651144] md: data-check of RAID array md4
[225724.680213] rfkill: input handler enabled
[225745.716506] rfkill: input handler disabled
[225751.439369] rfkill: input handler enabled
At 10:33 tty3 displayed "Login timed out after 60 seconds." without ever displaying a password prompt. It hangs without displaying another login prompt.
I sent a ^C to the serial tty around 10:35 and it was echo'd back to me but no terminal prompt was output to indicate that dmesg was interrupted.
10:36 or 10:37 serial tty outputs/echos a carriage return. No new input. Fan spins up again.
10:39 serial tty shows a prompt, which processes the return key pending, and hangs again.
10:42 have a serial prompt !
11:00 but I am still trying to execute any commands in the prompt. It is incredibly slow but is not losing keystrokes from its buffer (which sometimes happens for me)
11:01 the system responds on serial and tty3. It killed pastebinit due to oom.
lshw -C memory: https://paste.ubuntu.com/p/x5GMkHRktS