How to mount/recover failed drive?

Question

Score:1

Ubuntu

How to mount/recover failed drive?

Ren

6/9/23, 10:12 PM

Today my home server went to kernel panic, something went wrong with its system drive. I swapped the drive, restored the server and now I'm trying to figure out what happened to the old one. It actually is quite old, so I guess it will be a hw failure, still I'd like to try to learn something about recovery technics (and find why SMART didn't warn me). I can see the drive as /dev/sdb now, and I can detect lvm there, so I renamed ubuntu-vg to ubuntu-vg-old and activated it.

root@calcium:~# lvs
  LV        VG            Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  ubuntu-lv ubuntu-vg     -wi-ao---- <29.06g
  backups   ubuntu-vg-old -wi-a-----   1.29t
  ubuntu-lv ubuntu-vg-old -wi-a----- 200.00g

Unfortunately, mounting it doesn't work and after long timeout the command fails making drive inaccessible:

root@calcium:~# mount /dev/ubuntu-vg-old/ubuntu-lv /mnt -o ro,user
mount: /mnt: can't read superblock on /dev/mapper/ubuntu--vg--old-ubuntu--lv.
root@calcium:~# pvscan
  Error reading device /dev/sdb at 0 length 512.
  Error reading device /dev/sdb at 0 length 4096.
  Error reading device /dev/sdb1 at 0 length 4096.
  Error reading device /dev/sdb2 at 0 length 4096.
  Error reading device /dev/sdb3 at 0 length 4096.
  PV /dev/sda3   VG ubuntu-vg       lvm2 [58.12 GiB / 29.06 GiB free]
  Total: 1 [58.12 GiB] / in use: 1 [58.12 GiB] / in no VG: 0 [0   ]

After reboot (I didn't find another way to make it accessible again) the drive is back. I tried to fix it:

root@calcium:~# fsck /dev/mapper/ubuntu--vg--old-ubuntu--lv
fsck from util-linux 2.36.1
e2fsck 1.46.3 (27-Jul-2021)
/dev/mapper/ubuntu--vg--old-ubuntu--lv: recovering journal
fsck.ext4: Input/output error while trying to re-open /dev/mapper/ubuntu--vg--old-ubuntu--lv

/dev/mapper/ubuntu--vg--old-ubuntu--lv: ********** WARNING: Filesystem still has errors **********

But this behaves exactly same as mount, long timeout and the drive is dropped from the system. I ran SMART offline surface test overnight (smartctl -t offline /dev/sdb), it didn't find any issues nor changed any offline SMART attribute. badblocks read test also runs well, with no errors:

root@calcium:~# badblocks -b 4096 -c 1024 -s -o bb.out /dev/sdb
Checking for bad blocks (read-only test): done

So I tried nondestructive read-write test with badblocks (badblocks -b 4096 -c 1024 -s -n -v /dev/sdb) and the drive drops from the system again after about half an hour of run. I already replaced SATA cable and connected the drive to a different port. There is clearly an issue only when writing to particular sector(s).

Is there anything more I could try before full format (which most probably will fail too, I guess)?

Smart data:

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       414
  2 Throughput_Performance  0x0026   055   051   000    Old_age   Always       -       18840
  3 Spin_Up_Time            0x0023   077   066   025    Pre-fail  Always       -       7179
  4 Start_Stop_Count        0x0032   094   094   000    Old_age   Always       -       6274
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       31668
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       2
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       2286
181 Program_Fail_Cnt_Total  0x0022   100   100   000    Old_age   Always       -       19262840
191 G-Sense_Error_Rate      0x0022   099   099   000    Old_age   Always       -       11132
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   064   044   000    Old_age   Always       -       35 (Min/Max 14/56)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   087   083   000    Old_age   Always       -       1617
198 Offline_Uncorrectable   0x0030   252   084   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       235
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       2
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       6320

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     31656         -
# 2  Short offline       Completed without error       00%     31632         -
# 3  Short offline       Completed: read failure       10%     31608         2541336840
# 4  Extended offline    Completed without error       00%     31587         -
# 5  Short offline       Completed without error       00%     31560         -
# 6  Short offline       Completed without error       00%     31536         -
# 7  Short offline       Completed without error       00%     31512         -
# 8  Short offline       Completed without error       00%     31488         -
# 9  Short offline       Completed without error       00%     31464         -
#10  Short offline       Completed without error       00%     31440         -
#11  Extended offline    Completed without error       00%     31419         -
#12  Short offline       Completed without error       00%     31392         -
#13  Short offline       Completed without error       00%     31368         -
#14  Short offline       Completed without error       00%     31344         -
#15  Short offline       Completed without error       00%     31320         -
#16  Short offline       Completed without error       00%     31296         -
#17  Short offline       Completed without error       00%     31272         -
#18  Extended offline    Completed without error       00%     31251         -
#19  Short offline       Completed without error       00%     31224         -
#20  Short offline       Completed without error       00%     31200         -
#21  Short offline       Completed without error       00%     31176         -

33

0 + 0

lvm

disk

data-recovery

How to mount/recover failed drive?

Post an answer