Score:0

Understanding output of zpool status and physically identify faulted disk

il flag
hcr

I have a ZFS running on an Ubuntu 20.04 server that currently is running in degraded state due to a faulted disk. The output of zpool status looks as follows:

me@server:~$ zpool status tank
 pool: tank
 state: DEGRADED
 status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
 action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
 scan: scrub in progress since Thu Nov  4 17:56:13 2021
    9.80T scanned at 3.21G/s, 1.75T issued at 588M/s, 558T total
    0B repaired, 0.31% done, 11 days 11:52:21 to go
 config:

    NAME                     STATE     READ WRITE CKSUM
    tank                     DEGRADED     0     0     0
      raidz2-0               ONLINE       0     0     0
        ...
      raidz2-1               DEGRADED     0     0     0
        sda                  ONLINE       0     0     0
        sdb                  ONLINE       0     0     0
        sdc                  ONLINE       0     0     0
        sdd                  ONLINE       0     0     0
        sde                  ONLINE       0     0     0
        sdf                  ONLINE       0     0     0
        sdg                  ONLINE       0     0     0
        sdh                  ONLINE       0     0     0
        sdi                  ONLINE       0     0     0
        sdj                  ONLINE       0     0     0
        6775479499483215485  FAULTED      0     0     0  was /dev/sdk1
      raidz2-2               ONLINE       0     0     0
        sdk                  ONLINE       0     0     0
        sdn                  ONLINE       0     0     0
        sdm                  ONLINE       0     0     0
        ...
errors: No known data errors

I would like to understand what the number is telling me that occurs in the first column for the faulted disk in raidz2-1. My final aim would be to physically locate the faulted disk and therefore get more information on it (e.g. its serial number). My fist idea was to use smartctl for this. As sdk is also shown as an online member of raidz2-2: Would /dev/sdk1 still necessarily match the same (i.e. the faulted) device (even after the server was rebooted meanwhile)?

in flag
Mapping the device to serial number is really something you need to do before the devices fail. At this point, you might need to get the serial numbers for **all** the *working* devices. Then work backwards. Find the and the replace the drive you don't have a serial number for.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.