I am the system administrator of an Arch Linux-based workstation. Our workstation uses Slurm as the load manager and consists of one master machine and 4 other computation nodes. In the past few months, we observe that processes on some nodes are stuck from time to time, and rebooting the node solves the problem. We found that the stuck processes are in state D (disk sleeping), but when we use top or other commands to check the i/o of the node, we found that the i/o is in fact pretty low.
When some processes on the node are in state D, everything on the node is slow, but this is only for normal users. When we use superuser to run commands (including python) on the stuck nodes, everything works just fine. But when we change the user by su NORMAL_USER
, the process is stuck again. We used ps aux
and found that the process -bash
run by the NORMAL_USER is in state D. We have tried to use strace
to trace the stuck process, and we have also dig into the /proc/PID
, but we failed to find anything useful. We also failed to identify any useful messages from journalctl
. Maybe we are missing something.
We are willing to take any advice or comments.
Our kernel version is 5.10.47-1-lts.
Here is the /proc/PID/status
for the process in state D. The process is the bash
process when we use su NORMAL_USER
. It is a single thread process.
Name: bash
Umask: 0022
State: D (disk sleep)
Tgid: 3136723
Ngid: 0
Pid: 3136723
PPid: 3136722
TracerPid: 0
Uid: 1000093 1000093 1000093 1000093
Gid: 1000000 1000000 1000000 1000000
FDSize: 256
Groups: 1000000 1000083
NStgid: 3136723
NSpid: 3136723
NSpgid: 3136723
NSsid: 3110369
VmPeak: 16904 kB
VmSize: 16904 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 3788 kB
VmRSS: 3744 kB
RssAnon: 412 kB
RssFile: 3332 kB
RssShmem: 0 kB
VmData: 608 kB
VmStk: 132 kB
VmExe: 588 kB
VmLib: 1948 kB
VmPTE: 52 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 0
THP_enabled: 1
Threads: 1
SigQ: 12/772094
SigPnd: 0000000000000000
ShdPnd: 0000000008000002
SigBlk: 0000000000000000
SigIgn: 0000000000384004
SigCgt: 000000004b813efb
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Seccomp_filters: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: ffff,ffffffff
Cpus_allowed_list: 0-47
Mems_allowed: 00000003
Mems_allowed_list: 0-1
voluntary_ctxt_switches: 4
nonvoluntary_ctxt_switches: 1
Here is the /proc/PID/stack
for the same process.
[<0>] nfs_wait_bit_killable+0x1e/0x90 [nfs]
[<0>] nfs4_wait_clnt_recover+0x60/0x90 [nfsv4]
[<0>] nfs4_client_recover_expired_lease+0x17/0x50 [nfsv4]
[<0>] nfs4_do_open+0x2f4/0xbe0 [nfsv4]
[<0>] nfs4_atomic_open+0xe7/0x100 [nfsv4]
[<0>] nfs_atomic_open+0x1e1/0x520 [nfs]
[<0>] path_openat+0x5f5/0xfc0
[<0>] do_filp_open+0x91/0x130
[<0>] do_sys_openat2+0x96/0x150
[<0>] __x64_sys_openat+0x53/0x90
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9