Score:0

umount of logical volume failing during shutdown

us flag

In my system, multiple logical volumes are created, mounted and /etc/fstab entries are added after systemd starts up.

When system is rebooted for the first time, during shutdown many umount failures are observed.

Interestingly, in subsequent reboots, this problem is not observed.

The difference in behavior between first and subsequent reboots might be caused by the fact that during first reboot alone, the *.mount fragments are not generated by systemd-fstab-generator. I am particularly interested in fixing the issue for /var/log as I'm facing data loss here.

Following the advice from here, I added override conf for systemd-journal-flush.service as:

[Unit]
PartOf=var.mount var-log.mount

[Service]
ExecStop=/usr/local/bin/debugger.sh

Also, a debugging script was added in ExecStop to check for any open fds.

#!/bin/sh
#
date > /root/out
echo "=============" >> /root/out
fuser -m /var/log >> /root/out
echo "=============" >> /root/out
lsof | grep /var/log >> /root/out
echo "===============" >> /root/out
ps aux  >> /root/out

With this change as well, umount fails in first reboot and works in subsequent reboots.

[FAILED] Failed unmounting /var/crash/kernel.
[FAILED] Failed unmounting /var/lib/systemd/coredump.
[FAILED] Failed unmounting /var/log/audit.
[FAILED] Failed unmounting /var/sonic.
[FAILED] Failed unmounting /var/tmp.
[  OK  ] Stopped target Swap.
         Unmounting /mnt/sftp...
         Unmounting /var/crash...
[FAILED] Failed unmounting /var/crash.
[  OK  ] Stopped Flush Journal to Persistent Storage.
         Unmounting /var/log...
[FAILED] Failed unmounting /var/log.

From the debugging script output, both lsof and fuser outputs are empty. I don't find anything suspicious in ps output as well. (There is one umount in D state. Not sure if it is something that needs to be checked.)

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1 27.4  0.2 188048 37548 ?        Ss   15:43   5:26 /usr/lib/systemd/systemd --show-status=1 --log-target=journal-or-kmsg --log-level=notice
root           2  0.0  0.0      0     0 ?        S    15:43   0:00 [kthreadd]
root           3  0.0  0.0      0     0 ?        I<   15:43   0:00 [rcu_gp]
root           4  0.0  0.0      0     0 ?        I<   15:43   0:00 [rcu_par_gp]
root           5  0.0  0.0      0     0 ?        I    15:43   0:00 [kworker/0:0-cgroup_destroy]
root           6  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/0:0H-kblockd]
root           8  0.0  0.0      0     0 ?        I<   15:43   0:00 [mm_percpu_wq]
root           9  0.0  0.0      0     0 ?        S    15:43   0:00 [ksoftirqd/0]
root          10  0.2  0.0      0     0 ?        I    15:43   0:02 [rcu_sched]
root          11  0.0  0.0      0     0 ?        I    15:43   0:00 [rcu_bh]
root          12  0.0  0.0      0     0 ?        S    15:43   0:00 [migration/0]
root          13  0.0  0.0      0     0 ?        S    15:43   0:00 [cpuhp/0]
root          14  0.0  0.0      0     0 ?        S    15:43   0:00 [cpuhp/1]
root          15  0.0  0.0      0     0 ?        S    15:43   0:00 [migration/1]
root          16  0.0  0.0      0     0 ?        S    15:43   0:00 [ksoftirqd/1]
root          18  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/1:0H-kblockd]
root          19  0.0  0.0      0     0 ?        S    15:43   0:00 [cpuhp/2]
root          20  0.0  0.0      0     0 ?        S    15:43   0:00 [migration/2]
root          21  0.0  0.0      0     0 ?        S    15:43   0:00 [ksoftirqd/2]
root          23  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/2:0H-kblockd]
root          24  0.0  0.0      0     0 ?        S    15:43   0:00 [cpuhp/3]
root          25  0.0  0.0      0     0 ?        S    15:43   0:00 [migration/3]
root          26  0.0  0.0      0     0 ?        S    15:43   0:00 [ksoftirqd/3]
root          28  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/3:0H-kblockd]
root          29  0.0  0.0      0     0 ?        S    15:43   0:00 [cpuhp/4]
root          30  0.0  0.0      0     0 ?        S    15:43   0:00 [migration/4]
root          31  0.0  0.0      0     0 ?        S    15:43   0:00 [ksoftirqd/4]
root          33  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/4:0H-kblockd]
root          34  0.0  0.0      0     0 ?        S    15:43   0:00 [cpuhp/5]
root          35  0.0  0.0      0     0 ?        S    15:43   0:00 [migration/5]
root          36  0.0  0.0      0     0 ?        S    15:43   0:00 [ksoftirqd/5]
root          38  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/5:0H-kblockd]
root          39  0.0  0.0      0     0 ?        S    15:43   0:00 [cpuhp/6]
root          40  0.0  0.0      0     0 ?        S    15:43   0:00 [migration/6]
root          41  0.0  0.0      0     0 ?        S    15:43   0:00 [ksoftirqd/6]
root          43  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/6:0H-kblockd]
root          44  0.0  0.0      0     0 ?        S    15:43   0:00 [cpuhp/7]
root          45  0.0  0.0      0     0 ?        S    15:43   0:00 [migration/7]
root          46  0.0  0.0      0     0 ?        S    15:43   0:00 [ksoftirqd/7]
root          48  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/7:0H-kblockd]
root          49  0.0  0.0      0     0 ?        S    15:43   0:00 [kdevtmpfs]
root          50  0.0  0.0      0     0 ?        I<   15:43   0:00 [netns]
root          51  0.0  0.0      0     0 ?        S    15:43   0:00 [kauditd]
root          53  0.0  0.0      0     0 ?        S    15:43   0:00 [khungtaskd]
root          54  0.0  0.0      0     0 ?        S    15:43   0:00 [oom_reaper]
root          55  0.0  0.0      0     0 ?        I<   15:43   0:00 [writeback]
root          56  0.0  0.0      0     0 ?        S    15:43   0:00 [kcompactd0]
root          57  0.0  0.0      0     0 ?        SN   15:43   0:00 [ksmd]
root          58  0.0  0.0      0     0 ?        I<   15:43   0:00 [crypto]
root          59  0.0  0.0      0     0 ?        I<   15:43   0:00 [kblockd]
root          60  0.0  0.0      0     0 ?        I<   15:43   0:00 [ata_sff]
root          61  0.0  0.0      0     0 ?        I<   15:43   0:00 [md]
root          62  0.0  0.0      0     0 ?        I<   15:43   0:00 [devfreq_wq]
root          63  0.0  0.0      0     0 ?        S    15:43   0:00 [watchdogd]
root          65  0.0  0.0      0     0 ?        S    15:43   0:00 [kswapd0]
root          73  0.0  0.0      0     0 ?        I<   15:43   0:00 [kthrotld]
root          74  0.0  0.0      0     0 ?        S    15:43   0:00 [irq/30-pciehp]
root          76  0.0  0.0      0     0 ?        I    15:43   0:00 [kworker/1:1-memcg_kmem_cache]
root          77  0.0  0.0      0     0 ?        I    15:43   0:01 [kworker/3:1-events]
root          78  0.0  0.0      0     0 ?        I    15:43   0:00 [kworker/4:1-cgroup_destroy]
root          79  0.0  0.0      0     0 ?        I    15:43   0:00 [kworker/5:1-events]
root          80  0.0  0.0      0     0 ?        I    15:43   0:01 [kworker/6:1-memcg_kmem_cache]
root          82  0.0  0.0      0     0 ?        I<   15:43   0:00 [acpi_thermal_pm]
root          84  0.0  0.0      0     0 ?        I<   15:43   0:00 [tpm-vtpm]
root          85  0.0  0.0      0     0 ?        S    15:43   0:00 [scsi_eh_0]
root          86  0.0  0.0      0     0 ?        I<   15:43   0:00 [scsi_tmf_0]
root          87  0.0  0.0      0     0 ?        S    15:43   0:00 [scsi_eh_1]
root          88  0.0  0.0      0     0 ?        I<   15:43   0:00 [scsi_tmf_1]
root          89  0.0  0.0      0     0 ?        S    15:43   0:00 [scsi_eh_2]
root          90  0.0  0.0      0     0 ?        I<   15:43   0:00 [scsi_tmf_2]
root          91  0.0  0.0      0     0 ?        S    15:43   0:00 [scsi_eh_3]
root          92  0.0  0.0      0     0 ?        I<   15:43   0:00 [scsi_tmf_3]
root          94  0.0  0.0      0     0 ?        I    15:43   0:00 [kworker/u16:3-flush-252:18]
root          97  0.0  0.0      0     0 ?        I<   15:43   0:00 [ipv6_addrconf]
root          99  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/1:1H-kblockd]
root         118  0.0  0.0      0     0 ?        I<   15:43   0:00 [ixgbe]
root         126  0.0  0.0      0     0 ?        I    15:43   0:00 [kworker/6:2-events]
root         133  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/2:1H-kblockd]
root         135  0.0  0.0      0     0 ?        I<   15:43   0:00 [kdmflush]
root         138  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/4:1H-kblockd]
root         139  0.0  0.0      0     0 ?        S    15:43   0:00 [jbd2/dm-0-8]
root         140  0.0  0.0      0     0 ?        I<   15:43   0:00 [ext4-rsv-conver]
root         142  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/0:1H-kblockd]
root         145  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/6:1H-kblockd]
root         147  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/7:1H-events_highpri]
root         158  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/3:1H-kblockd]
root         166  0.2  0.3  80384 57352 ?        Ss   15:43   0:02 /usr/lib/systemd/systemd-journald
root         171  0.0  0.0      0     0 ?        I<   15:43   0:00 [kworker/5:1H-kblockd]
root         172  0.0  0.0      0     0 ?        I<   15:43   0:00 [ipmi-msghandler]
root         173  0.0  0.0      0     0 ?        I    15:43   0:00 [kworker/2:2-events_power_efficient]
root         183  0.2  0.0   7164  5260 ?        Ss   15:43   0:02 /usr/lib/systemd/systemd-udevd
root         227  0.0  0.0      0     0 ?        S    15:43   0:00 [scsi_eh_4]
root         228  0.0  0.0      0     0 ?        I<   15:43   0:00 [scsi_tmf_4]
root         231  0.0  0.0      0     0 ?        S    15:43   0:00 [usb-storage]
root         237  0.0  0.0      0     0 ?        I<   15:43   0:00 [ttm_swap]
root         272  0.0  0.0      0     0 ?        SN   15:43   0:00 [kipmi0]
root       68631  0.0  0.0      0     0 ?        I    15:48   0:00 [kworker/3:0-cgroup_destroy]
root       68632  0.0  0.0      0     0 ?        I    15:48   0:00 [kworker/1:2-mm_percpu_wq]
root       95828  0.1  0.0      0     0 ?        I    15:51   0:00 [kworker/2:0-cgroup_destroy]
root       98121  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root       98348  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root       98522  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root       98540  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root       98644  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root       98816  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root       98879  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root       98912  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root       99063  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root       99098  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root       99132  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root       99450  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root       99503  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root       99536  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root       99856  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root      100023  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root      100049  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root      100153  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root      100189  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root      100244  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root      100279  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root      100316  0.0  0.0      0     0 ?        I<   15:51   0:00 [kdmflush]
root      101223  0.2  0.0      0     0 ?        I    15:51   0:01 [kworker/u16:1-events_unbound]
root      101857  0.0  0.0      0     0 ?        S    15:51   0:00 [jbd2/dm-2-8]
root      101858  0.0  0.0      0     0 ?        I<   15:51   0:00 [ext4-rsv-conver]
root      101864  0.0  0.0      0     0 ?        S    15:51   0:00 [jbd2/dm-3-8]
root      101865  0.0  0.0      0     0 ?        I<   15:51   0:00 [ext4-rsv-conver]
root      101872  0.0  0.0      0     0 ?        S    15:51   0:00 [jbd2/dm-4-8]
root      101873  0.0  0.0      0     0 ?        I<   15:51   0:00 [ext4-rsv-conver]
root      101946  0.0  0.0      0     0 ?        S    15:51   0:00 [jbd2/dm-14-8]
root      101947  0.0  0.0      0     0 ?        I<   15:51   0:00 [ext4-rsv-conver]
root      101953  0.0  0.0      0     0 ?        S    15:51   0:00 [jbd2/dm-15-8]
root      101954  0.0  0.0      0     0 ?        I<   15:51   0:00 [ext4-rsv-conver]
root      101960  0.0  0.0      0     0 ?        S    15:51   0:00 [jbd2/dm-16-8]
root      101961  0.0  0.0      0     0 ?        I<   15:51   0:00 [ext4-rsv-conver]
root      101981  0.0  0.0      0     0 ?        S    15:51   0:00 [jbd2/dm-20-8]
root      101982  0.0  0.0      0     0 ?        I<   15:51   0:00 [ext4-rsv-conver]
root      101988  0.0  0.0      0     0 ?        S    15:51   0:00 [jbd2/dm-22-8]
root      101989  0.0  0.0      0     0 ?        I<   15:51   0:00 [ext4-rsv-conver]
root      101995  0.0  0.0      0     0 ?        S    15:51   0:00 [jbd2/sda3-8]
root      101996  0.0  0.0      0     0 ?        I<   15:51   0:00 [ext4-rsv-conver]
root      102071  0.1  0.0      0     0 ?        I    15:51   0:00 [kworker/7:0-events]
root      102103  0.0  0.0      0     0 ?        I    15:51   0:00 [kworker/5:0-rcu_gp]
root      102283  0.1  0.0      0     0 ?        I    15:51   0:00 [kworker/4:0-memcg_kmem_cache]
root      102829  0.0  0.0      0     0 ?        I    15:52   0:00 [kworker/7:3-cgroup_pidlist_destroy]
root      102941  0.0  0.0      0     0 ?        I    15:52   0:00 [kworker/0:1-rcu_gp]
root      116853  0.0  0.0      0     0 ?        I    15:59   0:00 [kworker/u16:0-events_unbound]
root      116965  0.0  0.0      0     0 ?        I    15:59   0:00 [kworker/1:0-mm_percpu_wq]
root      118574  0.0  0.0      0     0 ?        I    16:02   0:00 [kworker/u16:2-flush-252:22]
root      118585  0.0  0.0   7164  3708 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118586  0.0  0.0   7164  3644 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118587  0.0  0.0   7164  3640 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118588  0.0  0.0   7164  3640 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118589  0.5  0.0   7164  3640 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118590  0.0  0.0   7164  3640 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118591  0.0  0.0   7164  3640 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118592  0.0  0.0   7164  3640 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118593  0.0  0.0   7164  3640 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118594  0.0  0.0   7164  3640 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118595  0.0  0.0   7164  3640 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118596  0.0  0.0   7164  3620 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118597  0.0  0.0   7164  3640 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118598  0.0  0.0   7164  3640 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118599  0.0  0.0   7164  3640 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118600  0.0  0.0   7164  3640 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118601  1.6  0.0   7164  3780 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118603  0.0  0.0   7164  3640 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118604  0.0  0.0   7164  3528 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118605  0.0  0.0   7164  3528 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118606  0.0  0.0   7164  3528 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118607  0.0  0.0   7164  3528 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118608  0.0  0.0   7164  3528 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118609  0.0  0.0   7164  3528 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118610  0.0  0.0   7164  3528 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118611  0.0  0.0   7164  3528 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118612  0.0  0.0   7164  3528 ?        S    16:02   0:00 /usr/lib/systemd/systemd-udevd
root      118640  0.0  0.0      0     0 ?        I    16:02   0:00 [kworker/3:2-cgroup_destroy]
root      118731  0.0  0.0      0     0 ?        I    16:02   0:00 [kworker/3:3-events]
root      118745  0.0  0.0      0     0 ?        I    16:02   0:00 [kworker/4:2-cgroup_destroy]
root      118823  0.0  0.0      0     0 ?        I    16:02   0:00 [kworker/7:1-cgroup_destroy]
root      118898  0.0  0.0      0     0 ?        I    16:02   0:00 [kworker/0:2]
root      118899  0.2  0.0      0     0 ?        I    16:02   0:00 [kworker/0:3-events]
root      119014  0.0  0.0      0     0 ?        I    16:02   0:00 [kworker/6:0-mm_percpu_wq]
root      119052  0.0  0.0      0     0 ?        I    16:02   0:00 [kworker/2:1-events]
root      119111  0.0  0.0   6136   932 ?        R    16:02   0:00 /bin/umount /var/lib/docker -c
root      119117  0.0  0.0   6808  2936 ?        Ss   16:02   0:00 /bin/sh /usr/local/bin/debugger.sh
root      119120  0.0  0.0   6136   876 ?        D    16:02   0:00 /bin/umount /mnt/sftp -c
root      119124  0.0  0.0   6132  2036 ?        R    16:02   0:00 ps aux

So my questions are:

  1. Am i missing any graceful handling that is needed when /etc/fstab entries are added at runtime. Any specific way to inform systemd to create *.mount and systemd-fsck@* fragments?
  2. If no processes are using the mount points, what else can cause the umount failure? How to debug this further?
  3. Any options to solve data loss? Adding another service to do lazy/force umount could help?
Prem avatar
cn flag
(1) It looks like you have Mount Point inside Mount Point. Eg "/var/crash/kernel" inside "/var/crash" , which should be unmounted in order. You could avoid that Issue by not having that type of Structure. Eg you could make it "/var/crash-kernel" & "/var/crash" Etc. **(2) Ensure that you are unmounting in reverse order of mounting.** (3) Issue occurs when unmounting during shutdown , when it is somewhat hard to Debug. You can Debug it Proactively by unmounting manually & checking what errors occur. (4) The D Stat may be a Problem , you should check that Mount Point in Detail & try it manually.
Score:0
us flag

Thanks to @Prem for pointing out that reverse order has to be maintained during umount.

This issue could be fixed by delegating responsibility to systemd. systemd-fstab-generator is capable of handling the hierarchy of sub-directories and mounting in needed order.

From systemd-generator man page:

After editing /etc/fstab, the user should invoke systemctl daemon-reload. This will re-run all generators and cause systemd to reload units from disk. To actually mount new directories added to fstab, systemctl start /path/to/mountpoint or systemctl start local-fs.target may be used.

I did exactly the same, issue is resolved!

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.