I'm having this issue from a very long time and tried different solutions to address the problem but couldn't find the rootcause on how the FileSystem is getting corrupted.
I have the Ubuntu OS 22.04 in an Advantech IoT Gateway and it is currently in a site(remote) 300 miles away from me. It is impossible to do a sudo fsck /dev/mmcblk0p2 manually. So I reviewed few posts and found one stating that this file check can be done automatically each and everytime the device reboots, but when it reboots it should't goto the initramfs page then again I might need to go on-site to do the manual fsck.
In short I followed the below steps so that the filecheck happens each and every time there is a Powercut/Reboot it automatically does a fsck: FIX(I couldn't replicate the issue to cross-check if it works):
In /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet fsck.mode=force fsck.repair=yes"
sudo update-grub
sudo tune2fs -l /dev/mmcblk0p2 | grep 'Maximum mount'
sudo tune2fs -c 1 /dev/mmcblk0p2
But now I have a scenario where normally the device is online and I can see in teamviewer, but I'm not able to control anything in the device because we have some files corrupted.
We are just using the device to collect IoT data and the data is sent to thingsboard cloud.
So everything is normal, but we have scenarios where we need to control battery or any device, then, we need to get into the device via teamviewer or remotessh and then execute some python scripts. At this time we see that the system has gone into Read-Only mode, and we are unable to update anything.
Now this leads us back to scenario 1 where I need to do a manual fsck or reboot and if I reboot and after the FIX is in place, I'm not confident enough that it will pass thro' the initramfs screen. And if it goes into this screen again it will be a problem.
So this time I did the sudo fsck /dev/mmcblk0p2 and manually fixed all the nodes. But it is asking to reboot the device again. How sure are we that it won't go into the initramfs page.
And how to figure out the root cause of what is causing the disk to corrupt? Is there a solution to figure out as soon as the device goes corrupt and will the FIX work incase I reboot the system and is it the right FIX?
I have collected the system and dmesg logs incase it is required.