Score:7

SMART shows unreadble sectors, btrfs scrubs are clean - which is correct?

ng flag

I have a pair of disks in RAID1 formatted with btrfs.

The disks go through periodic scrubbing and I get notified with the results. They've been running great for about 2-3 years with no issues.

However, I've recently added smartd to my installation, and it instantly complained about a small number of unreadable sectors in one of the drives:

Device: /dev/sdc [SAT], 4 Currently unreadable (pending) sectors

I ran a scrub on that drive which found and corrected the same number of errors, but the smart error message doesn't go away. Subsequent scrubs on the same disk show no errors.

I'm not sure which of these tools is the most accurate - is smartd showing a false positive, or is btrfs missing bad sectors, or perhaps I'm misunderstanding the results?

What would be the best way to verify the health of the disk?

Thanks!

Score:12
ng flag

Shodanshok's answer is excellent, but to answer your literal question:

What would be the best way to verify the health of the disk?

Do a full write on it. The disk's firmware has marked those sectors as pending reallocation. It can do this when they're written to. This will either 'fix' the sector, or generate reallocated sectors, which you can also see in SMART.

You could then theoretically do a latency read scan on the disk. This is often telling of how reliable sectors are.

And in practical terms, it may be time to replace the disk. This is one of those preludes to failure that I watch for. Another is 'ata exception' in the syslogs. They typically happen before mdadm (or RAID controllers) kicks drives, and I suspect btrfs is similar (though I have no experience).

dkd6 avatar
ng flag
Thank you for your suggestions! SMART errors have not risen in about a month or so, but I'll attempt a full write to verify. I'll keep an eye out on the syslog as well.
mx flag
Possibly of note, the CPS values should go _down_ if those sectors are written to. On most standard hard drives, CPS represents a count of sectors that are known bad which have not been remapped to spares yet by the firmware. If that number does not go down after a full disk write and you do not see new ‘reallocated sectors’, you should seriously consider replacing the disk, as that indicates it’s run out of spare sectors for remapping.
joshudson avatar
cn flag
@AustinHemmelgarn: You know, there's times when I wish I could find disks that didn't remap to spares. If the sector is going bad, BTRFS can just warp around it.
ng flag
@joshudson such info would be lost of a reformat. Marking blocks as bad in the file system is, to me, something to happily leave in the past.
joshudson avatar
cn flag
@Halfgaar: lagging out the RAID because one of the disks has to go read in a spare is something I'd like to leave in the past. I always do a scan for bad blocks on format anyway. Sometimes the format actually fixes a bad block. (Seen it on old disks that didn't remap.)
Score:12
ca flag

Most disk implements the so called "surface area scan", which runs automatically and periodically. This kind of scan happens on the entire disk surface, even on empty/free areas. On the other hand, a btrfs scrub only checks used space, meaning that empty disk areas are not checked.

This means SMART found some issues on unused sectors, but they are not recognized by btrfs simply because it is not using these sectors.

If the SMART errors quickly climb, I suggest to replace the failing drive as soon as possible.

dkd6 avatar
ng flag
I wasn't aware of the difference, and that explains the discrepancy of the results. Thank you!
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.