Score:0

Drive failed or can be still used?

ie flag

I've the following WD drive (3TB) that gave me a problem (I was unable to access any file: even an ls command on it caused a never ending wait).

Here some details on the disk:

Disk /dev/sda: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: EZRX-00D8PB0
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt

Device     Start        End    Sectors  Size Type
/dev/sda1   2048 5860532223 5860530176  2.7T Linux filesystem

After this problem I run some test to discover what kind of problem is affecting it. As first step I run a short test on it sudo smartctl -t short /dev/sda that shown me the following error:

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     17480         8467144

Then I tried to get some attributes as described in this other post Understanding smartctl -a output using sudo smartctl -a /dev/sda. Here you can find the attribute table and the 5 most recent error log:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       71
  3 Spin_Up_Time            0x0027   174   161   021    Pre-fail  Always       -       6266
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       695
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   077   077   000    Old_age   Always       -       17481
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       457
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       63
193 Load_Cycle_Count        0x0032   179   179   000    Old_age   Always       -       64193
194 Temperature_Celsius     0x0022   122   101   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   197   000    Old_age   Always       -       356
198 Offline_Uncorrectable   0x0030   197   197   000    Old_age   Offline      -       1691
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   196   196   000    Old_age   Offline      -       1691

SMART Error Log Version: 1
ATA Error Count: 47 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 47 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 0a 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  e0 00 0a 00 00 00 00 00      04:00:17.522  STANDBY IMMEDIATE
  ef 03 46 00 00 00 a0 00      04:00:16.815  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      04:00:16.815  IDENTIFY DEVICE

Error 46 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 46 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 46 00 00 00 a0 00      04:00:16.815  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      04:00:16.815  IDENTIFY DEVICE
  e1 00 0f 00 00 00 00 00      04:00:15.095  IDLE IMMEDIATE
  ef 03 46 00 00 00 a0 00      04:00:14.575  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      04:00:14.575  IDENTIFY DEVICE

Error 45 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 0f 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  e1 00 0f 00 00 00 00 00      04:00:15.095  IDLE IMMEDIATE
  ef 03 46 00 00 00 a0 00      04:00:14.575  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      04:00:14.575  IDENTIFY DEVICE

Error 44 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 46 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 46 00 00 00 a0 00      04:00:14.575  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      04:00:14.575  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      04:00:12.170  SET FEATURES [Set transfer mode]

Error 43 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 61 46 00 00 00 a0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ef 03 46 00 00 00 a0 00      04:00:12.170  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      04:00:12.170  IDENTIFY DEVICE
  e1 00 0f 00 00 00 00 00      04:00:10.445  IDLE IMMEDIATE
  ef 03 46 00 00 00 a0 00      04:00:09.925  SET FEATURES [Set transfer mode]
  ec 00 00 00 00 00 a0 00      04:00:09.925  IDENTIFY DEVICE

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     17480         8467144

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Then I tried to inspect on the LBA_of_first_error (8467144) and, following a part of this guide, I run sudo sg_verify --lba=8467144 /dev/sda obtaining the following output that confirms me that there is a hardware failure:

verify(10):
Fixed format, current; Sense key: Medium Error
Additional sense: Id CRC or ECC error
VERIFY(10) medium or hardware error near lba=0x8132c8

As final step I tried to reassign the block without success sudo sg_reassign --address=8467144 /dev/sda:

REASSIGN BLOCKS: Illegal request, Invalid opcode
sg_reassign failed: Illegal request, Invalid opcode

So, to summarize, did I miss some step on this disk investigation? Is my drive dead or can still be used? I am not able to understand if there are some bad error form the SMART Attribute list, can you help me understanding if the drive have further errors?

Brandon Xavier avatar
us flag
If it's under warranty get it replaced. If not, dispose of it. Once a drive starts reporting errors, it's foolish to try to keep using it.
Michael Hampton avatar
cz flag
ONE error is sufficient to RMA the drive, even if no SMART attributes yet report failure (they will soon! and by then it is too late for your data).
Timmy avatar
ie flag
Unfortunately this disk has years, so no RMA :(
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.