Score:1

How to mount/recover failed drive?

br flag
Ren

Today my home server went to kernel panic, something went wrong with its system drive. I swapped the drive, restored the server and now I'm trying to figure out what happened to the old one. It actually is quite old, so I guess it will be a hw failure, still I'd like to try to learn something about recovery technics (and find why SMART didn't warn me). I can see the drive as /dev/sdb now, and I can detect lvm there, so I renamed ubuntu-vg to ubuntu-vg-old and activated it.

root@calcium:~# lvs
  LV        VG            Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  ubuntu-lv ubuntu-vg     -wi-ao---- <29.06g
  backups   ubuntu-vg-old -wi-a-----   1.29t
  ubuntu-lv ubuntu-vg-old -wi-a----- 200.00g

Unfortunately, mounting it doesn't work and after long timeout the command fails making drive inaccessible:

root@calcium:~# mount /dev/ubuntu-vg-old/ubuntu-lv /mnt -o ro,user
mount: /mnt: can't read superblock on /dev/mapper/ubuntu--vg--old-ubuntu--lv.
root@calcium:~# pvscan
  Error reading device /dev/sdb at 0 length 512.
  Error reading device /dev/sdb at 0 length 4096.
  Error reading device /dev/sdb1 at 0 length 4096.
  Error reading device /dev/sdb2 at 0 length 4096.
  Error reading device /dev/sdb3 at 0 length 4096.
  PV /dev/sda3   VG ubuntu-vg       lvm2 [58.12 GiB / 29.06 GiB free]
  Total: 1 [58.12 GiB] / in use: 1 [58.12 GiB] / in no VG: 0 [0   ]

After reboot (I didn't find another way to make it accessible again) the drive is back. I tried to fix it:

root@calcium:~# fsck /dev/mapper/ubuntu--vg--old-ubuntu--lv
fsck from util-linux 2.36.1
e2fsck 1.46.3 (27-Jul-2021)
/dev/mapper/ubuntu--vg--old-ubuntu--lv: recovering journal
fsck.ext4: Input/output error while trying to re-open /dev/mapper/ubuntu--vg--old-ubuntu--lv

/dev/mapper/ubuntu--vg--old-ubuntu--lv: ********** WARNING: Filesystem still has errors **********

But this behaves exactly same as mount, long timeout and the drive is dropped from the system. I ran SMART offline surface test overnight (smartctl -t offline /dev/sdb), it didn't find any issues nor changed any offline SMART attribute. badblocks read test also runs well, with no errors:

root@calcium:~# badblocks -b 4096 -c 1024 -s -o bb.out /dev/sdb
Checking for bad blocks (read-only test): done

So I tried nondestructive read-write test with badblocks (badblocks -b 4096 -c 1024 -s -n -v /dev/sdb) and the drive drops from the system again after about half an hour of run. I already replaced SATA cable and connected the drive to a different port. There is clearly an issue only when writing to particular sector(s).

Is there anything more I could try before full format (which most probably will fail too, I guess)?

Smart data:

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       414
  2 Throughput_Performance  0x0026   055   051   000    Old_age   Always       -       18840
  3 Spin_Up_Time            0x0023   077   066   025    Pre-fail  Always       -       7179
  4 Start_Stop_Count        0x0032   094   094   000    Old_age   Always       -       6274
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       31668
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       2
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       2286
181 Program_Fail_Cnt_Total  0x0022   100   100   000    Old_age   Always       -       19262840
191 G-Sense_Error_Rate      0x0022   099   099   000    Old_age   Always       -       11132
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   064   044   000    Old_age   Always       -       35 (Min/Max 14/56)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   087   083   000    Old_age   Always       -       1617
198 Offline_Uncorrectable   0x0030   252   084   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       235
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       2
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       6320

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     31656         -
# 2  Short offline       Completed without error       00%     31632         -
# 3  Short offline       Completed: read failure       10%     31608         2541336840
# 4  Extended offline    Completed without error       00%     31587         -
# 5  Short offline       Completed without error       00%     31560         -
# 6  Short offline       Completed without error       00%     31536         -
# 7  Short offline       Completed without error       00%     31512         -
# 8  Short offline       Completed without error       00%     31488         -
# 9  Short offline       Completed without error       00%     31464         -
#10  Short offline       Completed without error       00%     31440         -
#11  Extended offline    Completed without error       00%     31419         -
#12  Short offline       Completed without error       00%     31392         -
#13  Short offline       Completed without error       00%     31368         -
#14  Short offline       Completed without error       00%     31344         -
#15  Short offline       Completed without error       00%     31320         -
#16  Short offline       Completed without error       00%     31296         -
#17  Short offline       Completed without error       00%     31272         -
#18  Extended offline    Completed without error       00%     31251         -
#19  Short offline       Completed without error       00%     31224         -
#20  Short offline       Completed without error       00%     31200         -
#21  Short offline       Completed without error       00%     31176         -
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.