Score:1

How to avoid doing a fsck when upgrading the kernel on a Debian server?

il flag

I have a headless Debian server for three years. Debian 11 Bullseye amd64 at first and currently Debian 12 Bookworm. I am doing infrequent kernel upgrades with sudo apt-get dist-upgrade. It happened on three occasions that there were disk errors on the root filesystem upon booting up. It required me to do a fsck at the console on /dev/mapper/foobar--vg-root. it's a bit of a hassle to stick a keyboard and a screen to the box. The most recent problem happened during upgrading from (2023-07-??) to Linux foobar 6.1.0-11-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4 (2023-08-08) x86_64 GNU/Linux from deb http://security.debian.org/debian-security bookworm-security main contrib non-free but not from the main repository.

Question: Is there anything that I can do before rebooting to avoid disk error on the root filesystem?

vidarlo avatar
ar flag
The problem is disk errors. It's a good thing that you discover those. Get proper server hardware with serial console accessible via IPMI.
anx avatar
fr flag
anx
The check on bootup only happens on unclean shutdown or when you have exceeded the configured check intervals (e.g. as 40 mounts or 40 days, whichever comes first) without having the filesystem checked. If that fallback mechanism gives you indications of trouble, you should do the exact opposite of *avoiding* checks. You should check more, possibly in a way where you get a notification if one out of multiple redundant disks fails, long before booting a system from at the very least *questionable* storage integrity.
Chan Tai Man avatar
il flag
Thanks, vidarlo and @anx. (1) I will not reboot a server again unless I have physical access to it and time at hand. (2) For non-root filesystems, I should umount and fsck them before rebooting. (3) More fundamentally, I'll leave an old kernel running until I have to upgrade it.
A.B avatar
cl flag
A.B
The corruptions doesn't happen on reboot. It happens while the system is running. When the kernel detects such error, and the filesystem (when it's ext4) is mounted with option errors=continue it just ... continues. But marks the FS has having error to force a check the next time.
Score:1
cn flag

Linux ext file systems must be fsck'd offline for the check to be valid. Which for the root file system means the primary copy cannot be checked while the system is running.

As you are running LVM, could snapshot the LVs, fsck the copies, and know without unmounting or rebooting if to expect problems. See the e2scrub program in e2fsprogs for scripting ideas. This does not repair the source volumes, which still would need a fsck while they are unmounted to repair. So keep your crash cart or out of band access around.

More exactly, fsck attempts to get your file system metadata consistent. It depends on a reliable block device, which if this is the third disk error you might not have. Monitor the disks, consider replacement, and consider redundancy like RAID 1.

Seems likely the kernel upgrade wasn't the symptom of a problem. To extend the medical metaphor, it is going to the doctor on a scheduled physical (kernel upgrade) and finding a symptom (file system inconsistency) of an illness (block device problem).

Chan Tai Man avatar
il flag
Thank @John_Mahowald for a clear and complete answer. I will have a look at e2scrub. A long time ago 10+ years, I had tried RAID 1 mirroring on the root file system on a Debian desktop. Without reading the documentation diligently, I wasn't able to manage its upkeeping. There was no LVM at that time. The server in question is not that inaccessible. It is still in the building. I shall try RAID 1 again in my next server and perhaps better quality HDDs. Thanks again. P.S. You're absolutely right that kernel update unlikely was the problem; rebooting with a corrupted root filesystem was.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.