Score:1

Dying SSD found by LSPCI but not LSBLK - How to mount?

ca flag

Problem Statement

My SSD is clearly dying, and I want to pull off a few things before an RMA.

Observations

When I boot, the drive is detected as present by UEFI. Booting hangs for about 60s while the attempts to mount the drive time out. From dmseg:

[    2.845959] usb 1-4: SerialNumber: 01.00.00
[   62.536052] nvme nvme1: I/O 25 QID 0 timeout, disable controller
[   62.644219] nvme nvme1: Device shutdown incomplete; abort shutdown
[   62.660279] nvme nvme1: Removing after probe failure status: -4
[   62.677854] r8169 0000:02:00.0 enp2s0: renamed from eth0
[   62.683678] usb-storage 2-1:1.0: USB Mass Storage device detected

I can find the drive with lspci:

$ lspci | grep memory
03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a80c
05:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983

as well as the working NVMe drive, However, I cannot find it via lsblk, fdisk, or similar:

$ lsblk
NAME                   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
loop0                    7:0    0     4K  1 loop  /snap/bare/5
loop1                    7:1    0 148.4M  1 loop  /snap/chromium/2295
loop2                    7:2    0    62M  1 loop  /snap/core20/1587
loop3                    7:3    0  63.3M  1 loop  /snap/core20/1778
loop4                    7:4    0    55M  1 loop  /snap/cups/872
loop5                    7:5    0 163.3M  1 loop  /snap/firefox/1635
loop6                    7:6    0 400.8M  1 loop  /snap/gnome-3-38-2004/112
loop7                    7:7    0 346.3M  1 loop  /snap/gnome-3-38-2004/119
loop8                    7:8    0  91.7M  1 loop  /snap/gtk-common-themes/1535
loop9                    7:9    0  49.8M  1 loop  /snap/snapd/17950
sda                      8:0    1     0B  0 disk  
nvme0n1                259:0    0 931.5G  0 disk  
├─nvme0n1p1            259:1    0   512M  0 part  /boot/efi
├─nvme0n1p2            259:2    0   1.7G  0 part  /boot
└─nvme0n1p3            259:3    0 929.3G  0 part  
  └─nvme0n1p3_crypt    253:0    0 929.3G  0 crypt 
    ├─vgkubuntu-root   253:1    0 927.4G  0 lvm   /
    └─vgkubuntu-swap_1 253:2    0   1.9G  0 lvm   [SWAP]

This means that I have no mount points. I don't know that I have enough to grab onto to fsck.

Other failed attempts

  • If I try to boot only from the dying drive, I get sent back to BIOS, since there is not enough available of the filesystem to achieve initramfs.
  • Using a USB adapter for NVMe does not work. The dongle attempts to connect for ~60s before giving up. The device is NOT found in the lspci list.
  • Connecting the USB to a Windows box also fails, but Windows detects that a 0 byte drive is connected at that point.
  • I baked the drive on the off chance that the solder was loose but nothing changed.

The Ask

What can I do to temporarily mount this drive long enough to pull a few files from it? My backup is a week out of date, and I would like to get that week of data back. Can I force mount from something besides a /dev/ path?

EDIT - New Observations

I currently have mounted the drive with the USB mount, but the drive is 0 Bytes. From dmesg:

[ 5748.864308] usb 3-1.3.2: new high-speed USB device number 8 using xhci_hcd
[ 5748.979686] usb 3-1.3.2: New USB device found, idVendor=0bda, idProduct=9210, bcdDevice=20.01
[ 5748.979692] usb 3-1.3.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 5748.979694] usb 3-1.3.2: Product: RTL9210
[ 5748.979695] usb 3-1.3.2: Manufacturer: Realtek
[ 5748.979696] usb 3-1.3.2: SerialNumber: 012345678904
[ 5748.982535] usb-storage 3-1.3.2:1.0: USB Mass Storage device detected
[ 5748.982663] usb-storage 3-1.3.2:1.0: Quirks match for vid 0bda pid 9210: 800000
[ 5748.982687] scsi host1: usb-storage 3-1.3.2:1.0
[ 5749.997797] scsi 1:0:0:0: Direct-Access     Realtek  RTL9210 NVME     1.00 PQ: 0 ANSI: 6
[ 5749.997933] sd 1:0:0:0: Attached scsi generic sg1 type 0
[ 5750.003699] sd 1:0:0:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 5750.003705] sd 1:0:0:0: [sdb] Sense Key : Illegal Request [current] 
[ 5750.003707] sd 1:0:0:0: [sdb] Add. Sense: Invalid command operation code
[ 5750.003710] sd 1:0:0:0: [sdb] 0 512-byte logical blocks: (0 B/0 B)
[ 5750.003712] sd 1:0:0:0: [sdb] 0-byte physical blocks
[ 5750.005449] sd 1:0:0:0: [sdb] Test WP failed, assume Write Enabled
[ 5750.007179] sd 1:0:0:0: [sdb] Asking for cache data failed
[ 5750.007182] sd 1:0:0:0: [sdb] Assuming drive cache: write through
[ 5750.013190] sd 1:0:0:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 5750.013196] sd 1:0:0:0: [sdb] Sense Key : Illegal Request [current] 
[ 5750.013198] sd 1:0:0:0: [sdb] Add. Sense: Invalid command operation code
[ 5750.016690] sd 1:0:0:0: [sdb] Attached SCSI disk

Attempts at reading the S.M.A.R.T. data:

$ sudo smartctl /dev/sdb -a
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-58-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

Read NVMe Identify Controller failed: scsi error unsupported scsi opcode

and

$ sudo smartctl /dev/sdb -d scsi -a
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-58-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               Realtek
Product:              RTL9210 NVME
Revision:             1.00
Compliance:           SPC-4
LU is fully provisioned
Logical Unit id:      0x3001237923792379
Serial number:        0000000000000000
Device type:          disk
Local Time is:        Mon Jan 30 20:41:09 2023 CST
SMART support is:     Unavailable - device lacks SMART capability.

=== START OF READ SMART DATA SECTION ===
Current Drive Temperature:     0 C
Drive Trip Temperature:        0 C

Error Counter logging not supported

Device does not support Self Test logging

suggest that only the SCSI port of the RTL9210 chipset mounted rather than the drive proper.

According to lsblk, the drive is mounted at /dev/sdb:

NAME                   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
[...]
sdb                      8:16   0     0B  0 disk

Filesystems checks are not useful:

$ sudo dumpe2fs /dev/sdb
dumpe2fs 1.46.5 (30-Dec-2021)
dumpe2fs: Invalid argument while trying to open /dev/sdb
Couldn't find valid filesystem superblock.
$ sudo fsck /dev/sdb
fsck from util-linux 2.37.2
e2fsck 1.46.5 (30-Dec-2021)
fsck.ext2: Invalid argument while trying to open /dev/sdb

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

I also cannot use the mkfs trick to guess where the next superblock should be:

$ sudo mkfs.ext4 -n /dev/sdb
mke2fs 1.46.5 (30-Dec-2021)
mkfs.ext4: Device size reported to be zero.  Invalid partition specified, or
        partition table wasn't reread after running fdisk, due to
        a modified partition being busy and in use.  You may need to reboot
        to re-read your partition table.

Anyway I could flash the NVMe controller?

WesH avatar
ca flag
The drive is UEFI configured and NVMe, both of which are incompatible with SpinRite.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.