Problem Statement
My SSD is clearly dying, and I want to pull off a few things before an RMA.
Observations
When I boot, the drive is detected as present by UEFI. Booting hangs for about 60s while the attempts to mount the drive time out. From dmseg
:
[ 2.845959] usb 1-4: SerialNumber: 01.00.00
[ 62.536052] nvme nvme1: I/O 25 QID 0 timeout, disable controller
[ 62.644219] nvme nvme1: Device shutdown incomplete; abort shutdown
[ 62.660279] nvme nvme1: Removing after probe failure status: -4
[ 62.677854] r8169 0000:02:00.0 enp2s0: renamed from eth0
[ 62.683678] usb-storage 2-1:1.0: USB Mass Storage device detected
I can find the drive with lspci
:
$ lspci | grep memory
03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a80c
05:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
as well as the working NVMe drive, However, I cannot find it via lsblk
, fdisk
, or similar:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 4K 1 loop /snap/bare/5
loop1 7:1 0 148.4M 1 loop /snap/chromium/2295
loop2 7:2 0 62M 1 loop /snap/core20/1587
loop3 7:3 0 63.3M 1 loop /snap/core20/1778
loop4 7:4 0 55M 1 loop /snap/cups/872
loop5 7:5 0 163.3M 1 loop /snap/firefox/1635
loop6 7:6 0 400.8M 1 loop /snap/gnome-3-38-2004/112
loop7 7:7 0 346.3M 1 loop /snap/gnome-3-38-2004/119
loop8 7:8 0 91.7M 1 loop /snap/gtk-common-themes/1535
loop9 7:9 0 49.8M 1 loop /snap/snapd/17950
sda 8:0 1 0B 0 disk
nvme0n1 259:0 0 931.5G 0 disk
├─nvme0n1p1 259:1 0 512M 0 part /boot/efi
├─nvme0n1p2 259:2 0 1.7G 0 part /boot
└─nvme0n1p3 259:3 0 929.3G 0 part
└─nvme0n1p3_crypt 253:0 0 929.3G 0 crypt
├─vgkubuntu-root 253:1 0 927.4G 0 lvm /
└─vgkubuntu-swap_1 253:2 0 1.9G 0 lvm [SWAP]
This means that I have no mount points. I don't know that I have enough to grab onto to fsck
.
Other failed attempts
- If I try to boot only from the dying drive, I get sent back to BIOS, since there is not enough available of the filesystem to achieve
initramfs
.
- Using a USB adapter for NVMe does not work. The dongle attempts to connect for ~60s before giving up. The device is NOT found in the
lspci
list.
- Connecting the USB to a Windows box also fails, but Windows detects that a 0 byte drive is connected at that point.
- I baked the drive on the off chance that the solder was loose but nothing changed.
The Ask
What can I do to temporarily mount this drive long enough to pull a few files from it? My backup is a week out of date, and I would like to get that week of data back. Can I force mount from something besides a /dev/
path?
EDIT - New Observations
I currently have mounted the drive with the USB mount, but the drive is 0 Bytes. From dmesg
:
[ 5748.864308] usb 3-1.3.2: new high-speed USB device number 8 using xhci_hcd
[ 5748.979686] usb 3-1.3.2: New USB device found, idVendor=0bda, idProduct=9210, bcdDevice=20.01
[ 5748.979692] usb 3-1.3.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 5748.979694] usb 3-1.3.2: Product: RTL9210
[ 5748.979695] usb 3-1.3.2: Manufacturer: Realtek
[ 5748.979696] usb 3-1.3.2: SerialNumber: 012345678904
[ 5748.982535] usb-storage 3-1.3.2:1.0: USB Mass Storage device detected
[ 5748.982663] usb-storage 3-1.3.2:1.0: Quirks match for vid 0bda pid 9210: 800000
[ 5748.982687] scsi host1: usb-storage 3-1.3.2:1.0
[ 5749.997797] scsi 1:0:0:0: Direct-Access Realtek RTL9210 NVME 1.00 PQ: 0 ANSI: 6
[ 5749.997933] sd 1:0:0:0: Attached scsi generic sg1 type 0
[ 5750.003699] sd 1:0:0:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 5750.003705] sd 1:0:0:0: [sdb] Sense Key : Illegal Request [current]
[ 5750.003707] sd 1:0:0:0: [sdb] Add. Sense: Invalid command operation code
[ 5750.003710] sd 1:0:0:0: [sdb] 0 512-byte logical blocks: (0 B/0 B)
[ 5750.003712] sd 1:0:0:0: [sdb] 0-byte physical blocks
[ 5750.005449] sd 1:0:0:0: [sdb] Test WP failed, assume Write Enabled
[ 5750.007179] sd 1:0:0:0: [sdb] Asking for cache data failed
[ 5750.007182] sd 1:0:0:0: [sdb] Assuming drive cache: write through
[ 5750.013190] sd 1:0:0:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 5750.013196] sd 1:0:0:0: [sdb] Sense Key : Illegal Request [current]
[ 5750.013198] sd 1:0:0:0: [sdb] Add. Sense: Invalid command operation code
[ 5750.016690] sd 1:0:0:0: [sdb] Attached SCSI disk
Attempts at reading the S.M.A.R.T. data:
$ sudo smartctl /dev/sdb -a
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-58-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
Read NVMe Identify Controller failed: scsi error unsupported scsi opcode
and
$ sudo smartctl /dev/sdb -d scsi -a
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-58-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: Realtek
Product: RTL9210 NVME
Revision: 1.00
Compliance: SPC-4
LU is fully provisioned
Logical Unit id: 0x3001237923792379
Serial number: 0000000000000000
Device type: disk
Local Time is: Mon Jan 30 20:41:09 2023 CST
SMART support is: Unavailable - device lacks SMART capability.
=== START OF READ SMART DATA SECTION ===
Current Drive Temperature: 0 C
Drive Trip Temperature: 0 C
Error Counter logging not supported
Device does not support Self Test logging
suggest that only the SCSI port of the RTL9210 chipset mounted rather than the drive proper.
According to lsblk
, the drive is mounted at /dev/sdb
:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
[...]
sdb 8:16 0 0B 0 disk
Filesystems checks are not useful:
$ sudo dumpe2fs /dev/sdb
dumpe2fs 1.46.5 (30-Dec-2021)
dumpe2fs: Invalid argument while trying to open /dev/sdb
Couldn't find valid filesystem superblock.
$ sudo fsck /dev/sdb
fsck from util-linux 2.37.2
e2fsck 1.46.5 (30-Dec-2021)
fsck.ext2: Invalid argument while trying to open /dev/sdb
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
or
e2fsck -b 32768 <device>
I also cannot use the mkfs
trick to guess where the next superblock should be:
$ sudo mkfs.ext4 -n /dev/sdb
mke2fs 1.46.5 (30-Dec-2021)
mkfs.ext4: Device size reported to be zero. Invalid partition specified, or
partition table wasn't reread after running fdisk, due to
a modified partition being busy and in use. You may need to reboot
to re-read your partition table.
Anyway I could flash the NVMe controller?