Score:2

Errors on a zpool filesystem

cn flag

I'm using ZFS on a Debian 9 machine. This machine has been working for years without any problem until today.

The zfs pool is mounted on top of a RAID system, controlled by hardware (so only one drive is exposed to Linux as sda). You can see the output of "zpool status" below.

Before continuing, just mention that I checked the consistency of the RAID, and everything is fine.

Suddenly, all accesses to the filesystem provoke the command to freeze (even an ls command), and eventually, I need to reboot the machine manually.

When running zpool status -v, the output is:

#/sbin/zpool status -v
  pool: export
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 53h4m with 0 errors on Tue Mar 15 05:28:38 2022
config:

        NAME        STATE     READ WRITE CKSUM
        export      ONLINE       0     0     0
          sda       ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        export/home:<0x0>
        export/home:<0x2b2ed23>
        export/home:<0x2e1183b>
        export/home:<0x2b2e849>
        export/home:<0x1d0b5b1>

So, the main question is: What is the meaning of those files? How do I fix this problem?

Thank you in advance!

ewwhite avatar
ng flag
Did the scrubs work?
Score:1
ng flag

Run a zpool clear and two scrubs if you can, then see the result.

Score:1
ca flag

Those was corrupted files and now remains metadata:

export/home:<0x0>
export/home:<0x2b2ed23>
export/home:<0x2e1183b>
export/home:<0x2b2e849>
export/home:<0x1d0b5b1>

The cause is probably a hardware failure, but you need more information to point the root cause and you will probably be stopped by your RAID card.

Using a RAID hardware device under ZFS is not recommended to avoid the exact situation you encounters: hard time to diagnose issues.

My two cents:

  • let ZFS manage your disks (it is made for it)
  • use the most recent ZFS version (and a adequat OS)
ewwhite avatar
ng flag
Hardware RAID controllers don't fall over at an excessive rate for other filesystems. It's misleading to state that they're unacceptable for ZFS, or that single-lun/single device pools backed by hardware RAID is unacceptable. I common use case is an export from a SAN, ZFS in a VM or any situation where one wants to leverage ZFS volume management features without using the RAID features.
freezed avatar
ca flag
OK then, so what are your suggestions to diagnose this case @ewwhite?
ewwhite avatar
ng flag
Oh, a `zpool clear` and two scrubs.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.