Score:2

Disks reports "Disk is OK, 5439488 bad sectors"

us flag

That seemed to me to be a helluva lot of bad sectors. This is a SATA M.2 SSD, but I thought those things took care of hiding bad sectors without the operating system having to bother its pretty little head about them. Ubuntu 20.04 seems to be able to count these bad sectors, yet still announce that the disk is "OK".

Is the disk "OK"? I'd been having mysterious error messages announcing that "Ubuntu 20.04 experienced an internal error" with the /var/crash report suggesting that the problem is (being detected by?) gnome-control-center. The system ran just fine following this error—until I rebooted. On two occasions, a reboot after this error failed completely, requiring a complete new install of 20.04.

Why does Disks declare a drive on which it is able to detect 5439488 bad sectors "OK"? I had assumed Disks was telling me "you've got an ageing SSD but it's all under control. But if the bad sector count is responsible for the reboot failures (my assumption, not fact), why is Disks seemingly giving the SSD a pass?

My initial working hypothesis was that the SSD was failing fast. An early reply to this post (which now seems to have disappeared) was certain that 5439488 bad sectors was a sure sign the drive needed replacing.

I now believe that to be wrong.

For one thing, the bad sector count is remaining stable at 5439488 even now, several days later. And my idea that the overprovisioning taking care of bad sectors (which are going to be a fact of life for SSDs) is a function the SSD controller keeps invisible to the operating system, appears to have been a misconception. The overprovisioning must be visible, because the capacity the drive publishes to the world is 256GB. Internal overprovisioning would, I believe, only offer 240GB.

My original question boiled down to this: does overprovisioning conceal bad sectors from the operating system until the overprovisioning runs out, in which case the 5439488 bad sectors will be overflow that is eating usable capacity; or is the operating system in fact reporting every failed sector, including those taken care of by overprovisioning?

However, it's now clear to me that overprovisioning, probably handled by the SSD controller (am I right?) is being reported to SMART, and that Gnome Disks and GSmartControl must be reading this from SMART.

Two short tests and one extended test run with GSmartControl, BTW, all completed without error. LIke Gnome Disks, GSmartControl reports the drive as being "OK"

By my reckoning, the current (stable) bad sector count amounts to around 2.8GB. An SSD that was secretly overprovisioning would be announcing 240GB, providing a reserve of around 16GB. We're well within that limit.

I started out with the assumptions that there were connections between 1. Gnome Disks' bad sector count, 2. The "Ubuntu 20.04 experienced an internal error" message and 3. The twice-experienced failure to boot.

But I may be quite wrong about this. The last Ubuntu internal error message was not followed by failure to boot. As I say, the bad sector count remains stable and the system seems to be running well.

The first draft of this post was originally deprecated by the mod as being opinion-based. I'm not sure what that means—yes, it is my considered opinion now after much experimentation and deliberation that the SSD in question is still in decent, usable nick and doesn't need replacing (and that the non-booting problem isn't connected).

The bottom line question here would then be: is this a fair assessment? What am I missing.

Secondary questions: am I right in assuming that an SSD that announces its full capacity is still handling bad sectors internally, but reporting them to SMART? Does an SSD sold as, eg, 240GB handle that 16GB overprovisioning internally without reporting to SMART?

The answers are apparently not easy to come by on the Web. Can anyone here help?

-- Chris

cc flag
install the smartmontools package and run sudo smartctl -a /dev/sd? and post the output as text in your original posting. How is the SSD attached, USB,...? Which release of Ubuntu are you running? Have you ever run trim manually (Over USB it will obviously fail)?
karel avatar
sa flag
Does this answer your question? ["Disk is OK, 113 bad sectors"](https://askubuntu.com/questions/550445/disk-is-ok-113-bad-sectors)
Chris Bidmead avatar
us flag
I'm dubious about this @karel. "Bad sectors like spreading like fungus" doesn't sound to me like somebody who understands SSDs (or even bad sectors). Bad sectors happen, disk controllers know they happen, and up to a point disk controllers know how to do the right thing by them.
Chris Bidmead avatar
us flag
Thanks, @ubfan1, that sounds like a plan. I do have smartcl installed. I've never used it (thanks for the tip) and don't know how to respond when it tells me it can't detect the device type and needs more info against the -d parameter. The SSD in question (/dev/sda) is a LITEON CV3-8D256 (T881202) SATA SSD. (If Gnome disks can detect that, shouldn't smartctl be able to?).
Organic Marble avatar
us flag
This "Does the team agree that this laptop needs a new SSD?" is asking for an opinion. Edit the question, get rid of all the narrative, focus on one technical question, and it could be reopened.
Chris Bidmead avatar
us flag
Thanks for the heads-up Organic Marble. I was simply using this as a friendly form of the question "Does this SSD need replacing?" ISTM that however these questions are phrased, answers are likely as not to be opinions. The ultimate question here is "Does Gnome Disks include overprovisioned bad sectors in its bad sector count." I'd regard it as a service to the community to preserve the documentation of the journey towards that question if you were happy to stretch the point.
Chris Bidmead avatar
us flag
@ubfan1: OK, so I should have checked man smartctl. We need the -d ata parameter here. Following that, sudo smartctl /dev/sda -d ata -a gives me pages of stuff, but I guess the bit you're interested in is: === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED. This is probably the same as Gnome Disks saying the drive is OK.
Score:1
in flag

A large number of bad sectors is not necessarily a problem. But if the number of bad sectors is increasing (especially on spinning rust), or you have run out of replacement sectors (on either mechanical or SSDs), failure may come soon. (Write leveling is suppose to help this, but it may make things worse if you are writing a majority disk frequently. You should use trim before doing a full disk rewrite to mitigate this.)

Remember also, SSDs have a limited number of write cycles per block; SSD's use wear leveling to try to give every block the same number of writes to extend the life of the drive. If the SMART info lists it, this should be shown as Wear_Leveling_Count and the number under current value is a percentage left. When this reaches zero, the drive will die, probably by no longer accepting writes.

Chris Bidmead avatar
us flag
Thanks, Jonathan. But the key question is: what exactly is Gnome Disks reporting? Bad blocks in the overprovisioning scheme. Or additional bad blocks? Anyone?
Score:1
id flag

hex(5439488) '0x530000'

It's more likely that this number is a bit pattern. Many of the Raw Values listed by smartctl are bit patterns. How to interpret them usually depends on the manufacturer concerned.

Chris Bidmead avatar
us flag
I'm not clear how to understand your point here, David. If SMART isn't reporting 5439488 (decimal) individual bad sectors, what is it reporting?
Score:1
us flag
Jon

If you got 5439488 bad sectors I would replace the drive as it's a lot of bad sectors. Backup and replace the drive it has a big chance of failing soon.

Read what is a bad sector: https://www.howtogeek.com/173463/bad-sectors-explained-why-hard-drives-get-bad-sectors-and-what-you-can-do-about-it/.

Hope this helps. Jonathan Steadman.

Chris Bidmead avatar
us flag
Thanks for that, Jonathan. I've expanded my question in an edit.
user10489 avatar
in flag
Actually, the number of bad blocks is not important. It is important if that number is increasing -- so after working with the drive for a while, check the number again. If it is increasing, your drive is failing. If it is near the number of reserved replacement blocks, it is about to fail catastrophically. If after a very long while it increases by only one or two, it might be ok.
user10489 avatar
in flag
Actually, rereading the question, for an SSD, the only thing that matters is how close the number of bad blocks is to the number of replacement blocks in the built in overprovisioning. For some SSD's, smartctl will report this, sometimes as a percentage.
Chris Bidmead avatar
us flag
Yes, @user10489. I've come to the same conclusion (see self-answer below). It's understandable that a user like Jonathan would feel that 5439488 chunks of badness would be seriously bad news and my original post here shows I felt the same way. But it's pretty clear now (shoot me down in flames if anyone knows better) that an SSD declaring its full capacity of 256GB is exposing its overprovisioning to SMART, allowing all the bad sectors to be reported even though the OP mechanism is taking good care of them. What I still don't know (and would like to) is whether this applies to a 240GB SSD.
Score:0
us flag

There's been a response here that "Yes, the SSD needs to be replaced".

I'm in no position to deny this definitively. But on the evidence I have, I now believe this is not the best advice.

Since first posting this over a fortnight ago the system has been rock solid. I did have one instance of the Ubuntu internal error message I mention but the system rebooted without incident afterwards. And -- I believe, significantly -- the bad sector count has remained at 5439488 ever since this started.

So my working hypothesis from all this is: If Gnome Disks (or really SMART) says your drive is OK, it's OK. Don't be guided by an apparently high bad sector count. That just how SSD do things.

(I do think it would be -- er -- SMARTer if apps using SMART data were able to present the bad sector count as a wear percentage. But perhaps the over-provisioning total isn't accessible by SMART.)

--
Chris

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.