Score:0

Zpool permanent error for raidz2

cn flag

Problem: Zpool shows "errors: Permanent errors have been detected in the following files: tank/vms/fileserver:<0x0>

Server has six 12Tb drives in a raidz2. These are spinning drives. Zpool shows all drives are ONLINE. There are no read, write, or cksum errors. OS: Ubuntu 20.04.3 and zfsutils-linux 0.8.3-1ubuntu12.13

I've never seen a permanent error before and I'm not understanding how to go about fixing this. My understanding is that with a raidz2, the machine can have up to two drives failed without going down. If a third one goes then the zpool is gone. Is that correct? In this situation no drives show faulted and the only error showing is the permanent error. With no faulted drives, shouldn't zfs be able to recover or rebuild the file from other good copies on a raidz2? There by removing the permanent error.Or do we need to go to raidz3?

In this case, this is our backup server. If the fileserver VM had been running on this machine when this permanent error happened, is the VM then trashed?

From what I've been able to find, my error message is dealing with an object data corruption. I only noticed this message because the zfs replication going from our main server to this backup server was hung on trying to syncoid the fileserver. In order to fix this, I read I need to remove the file in question. Will zfs then mark these blocks as bad and rebuild the file from a good copy in another area on the disk?

Here's a few places I've read so far:

Repairing Damaged Data

What does a permanent ZFS error indicate?

Kind regards, pender

Score:0
za flag

My understanding is that with a raidz2, the machine can have up to two drives failed without going down. If a third one goes then the zpool is gone. Is that correct?

No. The array can have up to two drives without going down. This has nothing to do with machine going up or down.

In this situation no drives show faulted and the only error showing is the permanent error.

It shows permanent error in a file. In an already deleted file as far as I understand (but the original question still lacks zpool status output, we only have your potentially erroneous interpretation of it). Zfs self-healing capabilities aren't magical, the fs is able to recover from the errors to a certain threshold, but then bad things tend to happen. Like permanent errors of yours. Most usual case is when you have several checksum errors on multiple drives, intersecting on some file. Without zpool status it's hard to guess.

With no faulted drives, shouldn't zfs be able to recover or rebuild the file from other good copies on a raidz2?

It would, but not for a deleted file.

There by removing the permanent error.

So you want zfs to just silently swallow the errors. That's not how things work in IT. zfs kernel part is complaining about the errors happening and that's the good part.

If the fileserver VM had been running on this machine when this permanent error happened, is the VM then trashed?

Depends on. If this corrupted file is an zfs volume which the VM uses for it's disk - most certainly it would be.

Will zfs then mark these blocks as bad and rebuild the file from a good copy in another area on the disk?

It probably already did. Just scrub the pool and the error will go away after some time (not immediately though).

And don't use zfs on Linux in production. Yeah, that part will get dozens of downvotes, but it's the harsh truth. Nobody cares about zfs on Linux in production. The Linux fearless leader openly denies the need for it and hates its guts, because it originates from his most hated Sun Microsystems and is distributed under CDDL license. Use FreeBSD or Solaris (yeah, Joyent SmartOS is a possible choice too, though it's a bit exotic), at least those two are way more reliable when it comes to zfs. Solaris still has the canonical implementation of zfs (neither Linux nor FreeBSD cannot use it as a swap (one time they stated they can, but after some struggling this happened to merely be untrue), nor can crashdump onto it, and on Solaris these two things are native). Yeah, Linux has a userbase this wide that FreeBSD seems to be a statistical error comparing to it, but the thing is, when it comes to people using zfs, FreeBSD zfs userbase is way way bigger than the Linux one.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.