Score:1

Interpreting OOM killer logs when triggered on a Java thread

ru flag

When Linux OOM Killer interrupts a process, the kernel logs usually provide enough information about the culprit's memory consumption (even it is not killed eventually). For example, when snmpd process becomes an OOM trigger, its memory state can be found a bit later in the log by the PID=1190:

Jul 18 02:21:26 inm-agg kernel: snmpd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Jul 18 02:21:26 inm-agg kernel: CPU: 3 PID: 1190 Comm: snmpd Kdump: loaded Not tainted 5.4.17-2102.201.3.el8uek.x86_64 #2
...
Jul 18 02:21:26 inm-agg kernel: Tasks state (memory values in pages):
Jul 18 02:21:26 inm-agg kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
...
Jul 18 02:21:26 inm-agg kernel: [   1190]     0  1190    78491     1761   217088        0             0 snmpd

However, when the same happens to a thread of a Java application (on OpenJDK 64-Bit Server VM (build 25.372-b07, mixed mode) in my case), the log contains a PID that doesn't correspond to any process. For example, in the following log, an Apache Cassandra's input handling thread ReadStage-150 has become an OOM trigger:

Jul 16 22:01:45 inm-agg kernel: ReadStage-150 invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
Jul 16 22:01:45 inm-agg kernel: CPU: 11 PID: 1653163 Comm: ReadStage-150 Kdump: loaded Not tainted 5.4.17-2102.201.3.el8uek.x86_64 #2

But the PID=1653163 specified in the message is not mentioned anywhere else:

$ journalctl -k -b -e | grep "1653163" | wc -l
1

and it has nothing in common with the Java process PID itself (1652432):

Jul 16 22:01:45 inm-agg kernel: Tasks state (memory values in pages):
Jul 16 22:01:45 inm-agg kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
…
Jul 16 22:01:45 inm-agg kernel: [1652432]     0 1652432  7256008  5839621 49709056        0             0 java

So I wonder:

  1. Where the PID of the oom-killer message comes from?
  2. Why the thread is treated separately from its hosting JVM process in this case?
  3. If oom-killer would be configured to kill the OOM initiator, would it be possible (in theory at least) to interrupt only the culprit thread but not the JVM at whole?
Jaromanda X avatar
ru flag
`the log contains a PID that doesn't correspond to any process` - perhaps your java thing uses more than one process, and the process that is gobbling up memory isn't the one mentioned in the Apache log ... ¯\\_(ツ)_/¯
Toparvion avatar
ru flag
I don't think so because JVM threads (which I certainly observe here) do not produce new processes in the OS. Anyway, thanks for the guess.
Score:1
cn flag

What the Linux kernel calls a PID aka task is not strictly what ps or top calls a PID. Kernel PIDs have task group IDs (TGID) identifying the "heavyweight" process. Heavy in the sense that in certain multi-threaded programs, multiple PIDs share a TGID and memory. Thus, it is possible to see a java process using more than 100% of a CPU in certain performance monitoring tools.

"invoked oom-killer" header lines at the start show the unlucky task on CPU, and the stack up to that point. This may not be the task to "blame" for the OOM, and it also might not be killed if sysctl oom_kill_allocating_task is not set. But it probably just did a memory allocation.

"Tasks state" list, if enabled via sysctl:

Dumps the current memory state of all eligible tasks. Tasks not in the same memcg, not in the same cpuset, or bound to a disjoint set of mempolicy nodes are not shown.

In other words, this a best-effort to list processes on the system that could be killed. Note "tgid" is a column, to assist with tracking down multi-threaded thread groups. With cgroups enabled, such as when using systemd to contain units, this is a much shorter list than the entire system.

The kernel takes a very basic guess at task "badness", primarily based off of the ratio of this task to total system memory pages. Any "Killed process" messages show the details of the victim task, forcibly terminated via SIGKILL. This signal means the entire thread group is terminated.

None of these tasks are proven to be a "culprit". This is merely what the kernel can easily show you: what was just on CPU, some more tasks with their TGIDs for your convenience, and that killing something with a relatively large number of pages might save the system.


Realize that out of memory is a dire situation. The system is considering a crashing programs and possibly causing data loss. There is not much space to be clever.

If anything, your effort and cleverness should be spent towards improving your capacity planning. Find out how these services are memory contained, in a service manager, in a container. Observe memory consumption, by cgroup and system wide. Come up with a memory sizing algorithm, however many GB for the services, a bit for the kernel and administrative, and some few percent margin for safety. Adjust until you no longer get OOM killed.

Toparvion avatar
ru flag
Yes, I do realize that we need to prevent OOM rather than tackle with it, indeed. Though it still important to understand how the problem is reported in the logs. Thank you for the detailed answer!
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.