Score:1

Ubuntu Disks Reports DISK IS LIKELY TO FAIL SOON but Hard Disk Sentinel Health is 85% with Just 8 bad Sectors: What is going on?

vn flag

I am getting this cryptic message "DISK IS LIKELY TO FAIL SOON" from the Ubuntu Disks SMART Data & Self-Tests. However, when I inspect the self-test Results Window, I find that there are only 8 sectors that have been relocated (8*512 bytes from a 120GB SSD Drive), and no value is listed in red.

Ubuntu Disks SSD SMART Data after an Extended Self Test Part I Ubuntu Disks SSD SMART Data after an Extended Self Test Part II

Also, when I run HDSentinel for Ubuntu, it reports disk health of a reasonable 85%:

HDD Device  0: /dev/sda             
HDD Model ID : Hypertec SSD
HDD Serial No: HY22021100011
HDD Revision : U0202A0
HDD Size     : 114473 MB
Interface    : S-ATA Gen3, 6 Gbps
Temperature  : 40 °C
Highest Temp.: 40 °C
Health       : 85 %
Performance  : 100 %
Power on time: 95 days, 18 hours
Est. lifetime: more than 1000 days
Total written: 1.13 TB
  There are 8 bad sectors on the disk surface. The contents of these sectors were moved to the spare area.
  At this point, warranty replacement of the disk is not yet possible, only if the health drops further.
  It is recommended to examine the log of the disk regularly. All new problems found will be logged there.
    No actions needed.

I installed smartmontools and then typed (from the reference for checking SSD's):

  root@stephen-All-Series:~# sudo smartctl -a -d ata /dev/sda

Then I get the following output:

root@stephen-All-Series:~# sudo smartctl -a -d ata /dev/sda

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.19.0-46-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     Hypertec SSD
Serial Number:    HY22021100011
Firmware Version: U0202A0
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Jul 17 11:56:01 2023 IDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
No failed Attributes found.

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       8
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       2298
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       134
160 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       4
161 Unknown_Attribute       0x0033   100   100   050    Pre-fail  Always       -       82
163 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       13
164 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       12768
165 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       70
166 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       1
167 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       20
168 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       5050
169 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       100
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       0
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       8
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       6
194 Temperature_Celsius     0x0022   100   100   050    Old_age   Always       -       40
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       4
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       4
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       82
241 Total_LBAs_Written      0x0030   100   100   050    Old_age   Offline      -       37018
242 Total_LBAs_Read         0x0030   100   100   050    Old_age   Offline      -       54166
245 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       24414

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

It seems like the following line is being flagged:

161 Unknown_Attribute       0x0033   100   100   050    Pre-fail  Always 

What is this 161 Unknown_Attribute that is being flagged? I am not seeing the attribute 161 listed on the Hypertec site, but there is another link which on page 31 has some specifics of an Attribute 161. Is this the same attribute here, or something different?

Attribute 161 - Valid Spare Block Count
Contains the remaining spare block percentage
available on a solid state device. The percentage
starts at 100% and will typically decrease to 0% dur-
ing use. If this attribute reaches 0%, the solid state
device becomes read-only. The raw value of this
attribute may contain the actual number of spare
blocks.

This SSD is a Hypertec SSD2S120FS-L. This drive is very new, within its warranty period - I am fairly certain. (Ubuntu Disks shows this disks is just 3 months and 4 days old.)

Does this 161 attribute being flagged meaning that the disks can be returned to the supplier for warranty support now? Or what exactly is its meaning?

My question is a little bit different than the other questions and responses that I have seen, in that I need to find out specifics about the error report so that the SSD drive might be returned (and exchanged) if the problem is serious enough to qualify for warranty support.

I have not seen any related web-searches that go into the specifics yet that I am looking for.

Update to the Question

I checked out the Gnome site for a better description of the test being run. It seems to me that the gnome-disks software might be running some additional tests beyond what the SSD or Hard Drive may report with its statistics with the disk BIOS.

That is because SMART Tools also comes to the same result. How can it be? What tests are being run with the extended option? Are the tests just from the BIOS on the Hard Drives or SSD's:

SMART TOOLS reports:

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.19.0-46-generic] (local build)

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!

On the other hand, Ubuntu gnome-disks comes to the same answer!

gnome-disks SMART overall-health self-assessment FAILED test result

I attempted (with Ubuntu Disks) to format the drive, but the drive was not responsive. After going to the Format Disk menu and selecting to Format the Disks, the disk is still left with the original partitions:

Location of Format Menu in the Ubuntu Disks Application

In superuser root I issue the following commands to zero the disk sda manually:

root@stephen-All-Series:~# dd if=/dev/zero of=/dev/sda bs=64K count=100000 status=progress
6088359936 bytes (6.1 GB, 5.7 GiB) copied, 10 s, 609 MB/s
100000+0 records in
100000+0 records out
6553600000 bytes (6.6 GB, 6.1 GiB) copied, 13.4162 s, 488 MB/s
root@stephen-All-Series:~# 

However, this has absolutely no effect on the drive partitions which remain exactly the same as before the drive wipe attempt with the dd command above.

I am still not sure if the disk qualifies for warranty support because of the HDSentinel text

Total written: 1.14 TB
  There are 8 bad sectors on the disk surface. The contents of these sectors were moved to the spare area.
  At this point, warranty replacement of the disk is not yet possible, only if the health drops further.
  It is recommended to examine the log of the disk regularly. All new problems found will be logged there.
    No actions needed.

However, I am fairly convinced that this disk is useless for the regular end-customer.

If anyone has more information about what information is needed for warranty support (Hypertec Support Included), I would appreciate comments or answers. I did not find links on the Hypertec website to check the warranty status of the drive or to file a ticket. In all fairness, it seems that Hypertec has a contact form.

However, I think it would be most beneficial for the customer if there were a simple page field to fill out to determine warranty status like with Dell support.

That would be probably more friendly to the end-user.

Right now - even though the disk does not seem functional - I do not understand if there is warranty support. That is because Hard Disk Sentinel reports that there is no warranty coverage currently, even though the disk is broken, and it is impossible to write to the device with the Ubuntu shell dd command:

Health       : 85 %
Performance  : 100 %
Power on time: 95 days, 18 hours
Est. lifetime: more than 1000 days
Total written: 1.13 TB
  There are 8 bad sectors on the disk surface. The contents of these sectors were moved to the spare area.
  At this point, warranty replacement of the disk is not yet possible, only if the health drops further.
  It is recommended to examine the log of the disk regularly. All new problems found will be logged there.
    No actions needed.

As for the S.M.A.R.T. status, this one reference states "Pre-fail and Old_age are the category of error, it’s not indicating either are imminent."

It also references a [Wikipedia article "Self-Monitoring, Analysis and Reporting Technology"][14] that states:

Count of reallocated sectors. 
The raw value represents a count of the 
bad sectors that have been found and remapped.[32] 
Thus, the higher the attribute value, the more sectors
the drive has had to reallocate. 
This value is primarily used as a metric 
of the life expectancy of the drive; 
a drive which has had any reallocations at all 
is significantly more likely to fail 
in the immediate months.

Thus, according to these references, this attribute may indicate that the drive might fail in the coming months. But I agree that the failure is not just the attribute value, which is "pre-fail". The disk has already failed since the dd command is unable to write data to it! The question is around warranty coverage since HDSentinel seems to indicate that the disk is still within its warranty period even though I am unable to write data to it now.

[14]: Wikipedia contributors. Self-Monitoring, Analysis and Reporting Technology. Wikipedia, The Free Encyclopedia. June 30, 2023, 17:24 UTC. Available at: https://en.wikipedia.org/w/index.php?title=Self-Monitoring,_Analysis_and_Reporting_Technology&oldid=1162701953. Accessed July 19, 2023.

Artur Meinild avatar
vn flag
Why do you think attribute 161 has anything to do with this? I have a similar attribute on my disks (attribute 180) called `Unused_Reserve_NAND_Blk`, and it looks like this: `180 Unused_Reserve_NAND_Blk 0x0033 100 100 000 Pre-fail Always - 13`. I think you're reading a false positive here, and that it's indeed quite normal.
cn flag
"Ubuntu Disks Reports DISK IS LIKELY TO FAIL SOON but Hard Disk Sentinel Health is 85% with Just 8 bad Sectors: What is going on?" 85% is bad :P BUT you also need to know when it was 100%. If that was yesterday the disk will fail soon. @ArturMeinild you are correct :)
Stephen Elliott avatar
vn flag
@ArturMeinild : I just suspected that it might have something to do with it because 1) It is listed in the gnome-disks test as uniquely pre-fail 2) **"Attribute 161 - Valid Spare Block Count Contains the remaining spare block percentage available on a solid state device. The percentage starts at 100% and will typically decrease to 0% during use. If this attribute reaches 0%, the solid state device becomes read-only."** That describes what the SSD disk is doing. The disk is like a read-only device that accepts write commands, but does not actually implement them - just how it is failing.
Artur Meinild avatar
vn flag
Prefail is an attribute type - it doesn't indicate anything is wrong.
in flag
You ask about warranty. You'll need to say where you live, where you bought the ssd, and what class of customer you are (individual, business) in order to get help with that. You seem reticent to return the drive?
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.