Score:0

How to Identify and Repair Bad Blocks on an SSD

et flag

I have been trying to solve the issue for two days now. Let me explain:

Background

System Settings

I have an Acer Nitro-5 Laptop.
The storage settings: 1TB HDD & 256GB Intel ssdpekkw256g7.
The OS settings: Dual boot, Ubuntu 20.4 on HDD, and Windows 10 on SSD.

Origin of the Issue

Lately, my Windows was getting slower every day. I've had it for nearly 5 years.
Two days ago, I started up Windows and the Desktop process kept hanging. Thought it could be a virus or something so I ran a full scan of the system. While it was scanning the power went off and thus my laptop was shut down in the middle of the scan. The problem might've occurred before or after this outage I'm not sure. When I got the power back on and started up Windows, I saw the blue screen: The blue screen

I logged into Ubuntu to fix the issue. I backed up everything I needed so I'm safe from data loss. Except I lost some data from the SSD.

What I Have Tried

GParted

I tried to check the disk for issues but it said the following:
Cluster accounting failed at 13300092 (0xcaf17c): extra cluster in $Bitmap
Cluster accounting failed at 13300093 (0xcaf17d): extra cluster in $Bitmap
Filesystem check failed!
Totally 999 cluster accounting mismatches. ERROR: NTFS is inconsistent. Run chkdsk /f on Windows then reboot it TWICE! The usage of the /f parameter is very IMPORTANT! No modification was and will be made to NTFS by this software until it gets repaired.

Windows Safe Mode and ChkDsk

Since I don't have access to Windows directly (remember the blue screen). I tried Hiren's BootCD winPE to gain access to the Windows command prompt. I ran the command and it says one of the three messages below:
Windows cannot run disk checking on this volume because it is write protected
The type of the file system is RAW. CHKDSK is not available for RAW drives.
**This is the most recent one: **

chkdsk g: /f
Stage 2: Examining file name linkage ... 
An unspecified error occurred (696e647863686b2e 9cd). 
An unspecified error occurred (6e74667363686b2e 1798). 
Failed to transfer logged messages to the event log with status 6.

And thus I reached a dead-end for chkdsk.

Formatting the Disk/Partition

I tried to format the partition with Disk Management, Diskpart, GParted, Disks, etc. They didn't work and either said the I/O error or some unexpected error.

Overwriting The Disk by Installing Ubuntu on SSD

I tried to install another Ubuntu on the SSD and overwrite Windows files. But this wasn't working either and it said 'Error occurred while formatting the disk' or something like that.

Trying to Fix it Through Ubuntu

sudo mkfs.ext4 /dev/nvme0n1p4 I tried to change the file system through Ubuntu but this was the log:

mke2fs 1.45.5 (07-Jan-2020)
/dev/nvme0n1p4 contains a ntfs file system labelled 'OS'
Proceed anyway? (y,N) y
Creating filesystem with 62186249 4k blocks and 15548416 inodes
Filesystem UUID: 98f9d29e-c882-42ca-9b83-d15bb1f1a9cb
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: mkfs.ext4: Input/output error while writing out and closing file system

So I tried to check the bad blocks:

sudo bad blocks -v /dev/nvme0n1

Which resulted in:

Checking blocks 0 to 250059095
Checking for bad blocks (read-only test): 1404620
1404621
1404622
1404623
.
.
.

Basically, it found 1333 bad blocks. But I don't think this is a physical issue since the SSD was working fine until a few days ago.

Using NVME

I used nvme to check the logs:

Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning                    : 0x9
temperature                         : 25 C
available_spare                     : 0%
available_spare_threshold           : 10%
percentage_used                     : 4%
data_units_read                     : 23,584,781
data_units_written                  : 22,680,050
host_read_commands                  : 424,922,169
host_write_commands                 : 435,538,029
controller_busy_time                : 7,189
power_cycles                        : 3,689
power_on_hours                      : 10,192
unsafe_shutdowns                    : 298
media_errors                        : 716,805
num_err_log_entries                 : 716,805
Warning Temperature Time            : 265
Critical Composite Temperature Time : 0
Thermal Management T1 Trans Count   : 0
Thermal Management T2 Trans Count   : 0
Thermal Management T1 Total Time    : 0
Thermal Management T2 Total Time    : 0

I really don't have any idea how to fix it. Can you please help me out?
I would rather not replace the SSD.

guiverc avatar
cn flag
I'm no expert here (*and technology changes all the time, thus what applies to one SSD won't apply with others*), but bad blocks on SSDs are somewhat device specific, with many/most devices auto-handling it having a provision of *allocated* blocks available (*when new*) which the system auto-maps in to replace badblocks (*at least until the spares are all used up; what happens here is device specific*). I see your question as about hardware, as the OS has no control of this for many SSD devices (*though this probably doesn't apply to all as there are many generations/types*). My 2c
Kianoush Arshi avatar
et flag
Yeah, that's right. But I think since there are so many bad blocks, the system can't afford he remapping. On the other hand, the problem might be somewhere else!
Marco avatar
br flag
I had a SSD which could be read without problem, but writing did not change a bit. The old content was still there. Booting (at least Windows) worked till it wants to read the changed content (registry?, swap?), after about 5min I got a bluescreen.
Marco avatar
br flag
I had another SSD (intel) which suddenly claimed to be only 32k in size. According to my research it was a known Firmware bug, without a chance to get support after warranty time expired.
Marco avatar
br flag
Have you checked the log of the SSD: `nvme smart-log /dev/nvme0n1` and `nvme error-log /dev/nvme0n1`. Maybe other nvme commands might get further info.
Kianoush Arshi avatar
et flag
I checked the logs. I can't understand them though: Smart log: unsafe_shutdowns : 298 media_errors : 716,805 num_err_log_entries : 716,805 Warning Temperature Time : 265
Kianoush Arshi avatar
et flag
The error log has 64 entries: Entry[63] ................. error_count : 716742 sqid : 7 cmdid : 0x34 status_field : 0x280(Unknown) parm_err_loc : 0xffff lba : 0x1a96c000 nsid : 0x1 vs : 0 cs : 0
cn flag
if you have them and they increase fairly quickly replace the disk.
Score:1
br flag

Each SSD has some spare blocks to reallocate bad blocks, because the cells degrade during usage.

available_spare                     : 0%
...
media_errors                        : 716,805
num_err_log_entries                 : 716,805

According to the smart-log all the spare space is used. As there is no spare space for reallocating new bad blocks, the disk can be seen as "end-of-life" and "broken".

If your disk has that much bad blocks, it is very likely there is some other defect and increasing the space will help only a short period of time.

Is it possible to increase the share space?

I did it once for a SSD connected via SATA with the hdparm tool.

It heavily depends on the SSD Firmware.
Might be this does not work on your drive.

Of course this can only be done on a empty disk. If disk is not empty, you will most probably loose data. In worst case the disk will be completely broken. Be warned. Do it on your own risk.

To get the HPA (max writable sector):

hdparm -N /dev/sda

/dev/sda:
 max sectors   = 3907029168/3907029168, HPA is disabled

With the same command you can set the "max sector" usable by the system, which will give the rest to the available_spare.

I will not give you the exact command (it easy if you can read manuals).
If you can not find out by yourself, it's best, you don't do it.

I have no experience if this works with NVME drives or maybe can be done with the nvme command. If somebody knows it, please add an answer.

Score:0
ug flag

Maybe try to dd all the data into a new disk. Then force formatting the SSD. if it still shows bad sectors, that's a bad news.

Bad sectors can be de-allocate, but I think if you have 1 bad sector, you will have more and more, it is unstoppable degradation.

The good new is that the SSD-s are really cheap right now.

Kianoush Arshi avatar
et flag
How can I force format the SSD? As I mentioned, I tried many formatting methods, but all of them failed.
HBtools avatar
ug flag
https://askubuntu.com/questions/253096/low-level-format-of-hard-drive
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.