Score:2

Ubuntu

Disks reports "Disk is OK, 5439488 bad sectors"

Chris Bidmead

11/24/22, 12:23 PM

That seemed to me to be a helluva lot of bad sectors. This is a SATA M.2 SSD, but I thought those things took care of hiding bad sectors without the operating system having to bother its pretty little head about them. Ubuntu 20.04 seems to be able to count these bad sectors, yet still announce that the disk is "OK".

Is the disk "OK"? I'd been having mysterious error messages announcing that "Ubuntu 20.04 experienced an internal error" with the /var/crash report suggesting that the problem is (being detected by?) gnome-control-center. The system ran just fine following this error—until I rebooted. On two occasions, a reboot after this error failed completely, requiring a complete new install of 20.04.

Why does Disks declare a drive on which it is able to detect 5439488 bad sectors "OK"? I had assumed Disks was telling me "you've got an ageing SSD but it's all under control. But if the bad sector count is responsible for the reboot failures (my assumption, not fact), why is Disks seemingly giving the SSD a pass?

My initial working hypothesis was that the SSD was failing fast. An early reply to this post (which now seems to have disappeared) was certain that 5439488 bad sectors was a sure sign the drive needed replacing.

I now believe that to be wrong.

For one thing, the bad sector count is remaining stable at 5439488 even now, several days later. And my idea that the overprovisioning taking care of bad sectors (which are going to be a fact of life for SSDs) is a function the SSD controller keeps invisible to the operating system, appears to have been a misconception. The overprovisioning must be visible, because the capacity the drive publishes to the world is 256GB. Internal overprovisioning would, I believe, only offer 240GB.

My original question boiled down to this: does overprovisioning conceal bad sectors from the operating system until the overprovisioning runs out, in which case the 5439488 bad sectors will be overflow that is eating usable capacity; or is the operating system in fact reporting every failed sector, including those taken care of by overprovisioning?

However, it's now clear to me that overprovisioning, probably handled by the SSD controller (am I right?) is being reported to SMART, and that Gnome Disks and GSmartControl must be reading this from SMART.

Two short tests and one extended test run with GSmartControl, BTW, all completed without error. LIke Gnome Disks, GSmartControl reports the drive as being "OK"

By my reckoning, the current (stable) bad sector count amounts to around 2.8GB. An SSD that was secretly overprovisioning would be announcing 240GB, providing a reserve of around 16GB. We're well within that limit.

I started out with the assumptions that there were connections between 1. Gnome Disks' bad sector count, 2. The "Ubuntu 20.04 experienced an internal error" message and 3. The twice-experienced failure to boot.

But I may be quite wrong about this. The last Ubuntu internal error message was not followed by failure to boot. As I say, the bad sector count remains stable and the system seems to be running well.

The first draft of this post was originally deprecated by the mod as being opinion-based. I'm not sure what that means—yes, it is my considered opinion now after much experimentation and deliberation that the SSD in question is still in decent, usable nick and doesn't need replacing (and that the non-booting problem isn't connected).

The bottom line question here would then be: is this a fair assessment? What am I missing.

Secondary questions: am I right in assuming that an SSD that announces its full capacity is still handling bad sectors internally, but reporting them to SMART? Does an SSD sold as, eg, 240GB handle that 16GB overprovisioning internally without reporting to SMART?

The answers are apparently not easy to come by on the Web. Can anyone here help?

-- Chris

439

4 + 7

boot

ssd

ubfan1

11/24/22, 3:46 PM

install the smartmontools package and run sudo smartctl -a /dev/sd? and post the output as text in your original posting. How is the SSD attached, USB,...? Which release of Ubuntu are you running? Have you ever run trim manually (Over USB it will obviously fail)?

0

Reply

karel

11/26/22, 10:45 AM

Does this answer your question? ["Disk is OK, 113 bad sectors"](https://askubuntu.com/questions/550445/disk-is-ok-113-bad-sectors)

0

Reply

Chris Bidmead

11/26/22, 1:36 PM

I'm dubious about this @karel. "Bad sectors like spreading like fungus" doesn't sound to me like somebody who understands SSDs (or even bad sectors). Bad sectors happen, disk controllers know they happen, and up to a point disk controllers know how to do the right thing by them.

0

Reply

Chris Bidmead

11/26/22, 1:42 PM

Thanks, @ubfan1, that sounds like a plan. I do have smartcl installed. I've never used it (thanks for the tip) and don't know how to respond when it tells me it can't detect the device type and needs more info against the -d parameter. The SSD in question (/dev/sda) is a LITEON CV3-8D256 (T881202) SATA SSD. (If Gnome disks can detect that, shouldn't smartctl be able to?).

0

Reply

Organic Marble

11/26/22, 2:06 PM

This "Does the team agree that this laptop needs a new SSD?" is asking for an opinion. Edit the question, get rid of all the narrative, focus on one technical question, and it could be reopened.

0

Reply

Chris Bidmead

11/26/22, 2:15 PM

Thanks for the heads-up Organic Marble. I was simply using this as a friendly form of the question "Does this SSD need replacing?" ISTM that however these questions are phrased, answers are likely as not to be opinions. The ultimate question here is "Does Gnome Disks include overprovisioned bad sectors in its bad sector count." I'd regard it as a service to the community to preserve the documentation of the journey towards that question if you were happy to stretch the point.

0

Reply

Chris Bidmead

11/26/22, 2:23 PM

@ubfan1: OK, so I should have checked man smartctl. We need the -d ata parameter here. Following that, sudo smartctl /dev/sda -d ata -a gives me pages of stuff, but I guess the bit you're interested in is: === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED. This is probably the same as Gnome Disks saying the drive is OK.

0

Reply

Score:1

Ubuntu

user10489

11/25/22, 4:13 PM

A large number of bad sectors is not necessarily a problem. But if the number of bad sectors is increasing (especially on spinning rust), or you have run out of replacement sectors (on either mechanical or SSDs), failure may come soon. (Write leveling is suppose to help this, but it may make things worse if you are writing a majority disk frequently. You should use trim before doing a full disk rewrite to mitigate this.)

Remember also, SSDs have a limited number of write cycles per block; SSD's use wear leveling to try to give every block the same number of writes to extend the life of the drive. If the SMART info lists it, this should be shown as Wear_Leveling_Count and the number under current value is a percentage left. When this reaches zero, the drive will die, probably by no longer accepting writes.

0 + 1

Chris Bidmead

11/25/22, 10:54 PM

Thanks, Jonathan. But the key question is: what exactly is Gnome Disks reporting? Bad blocks in the overprovisioning scheme. Or additional bad blocks? Anyone?

0

Reply

Score:1

Ubuntu

David Wright

12/9/22, 12:00 AM

hex(5439488) '0x530000'

It's more likely that this number is a bit pattern. Many of the Raw Values listed by smartctl are bit patterns. How to interpret them usually depends on the manufacturer concerned.

0 + 1

Chris Bidmead

12/9/22, 11:12 PM

I'm not clear how to understand your point here, David. If SMART isn't reporting 5439488 (decimal) individual bad sectors, what is it reporting?

0

Reply

Score:1

Ubuntu

Jon

11/24/22, 6:39 PM

If you got 5439488 bad sectors I would replace the drive as it's a lot of bad sectors. Backup and replace the drive it has a big chance of failing soon.

Read what is a bad sector: https://www.howtogeek.com/173463/bad-sectors-explained-why-hard-drives-get-bad-sectors-and-what-you-can-do-about-it/.

Hope this helps. Jonathan Steadman.

0 + 4

Chris Bidmead

11/25/22, 9:31 AM

Thanks for that, Jonathan. I've expanded my question in an edit.

1

Reply

user10489

12/2/22, 11:03 AM

Actually, the number of bad blocks is not important. It is important if that number is increasing -- so after working with the drive for a while, check the number again. If it is increasing, your drive is failing. If it is near the number of reserved replacement blocks, it is about to fail catastrophically. If after a very long while it increases by only one or two, it might be ok.

0

Reply

user10489

12/2/22, 11:10 AM

Actually, rereading the question, for an SSD, the only thing that matters is how close the number of bad blocks is to the number of replacement blocks in the built in overprovisioning. For some SSD's, smartctl will report this, sometimes as a percentage.

0

Reply

Chris Bidmead

12/9/22, 1:30 PM

Yes, @user10489. I've come to the same conclusion (see self-answer below). It's understandable that a user like Jonathan would feel that 5439488 chunks of badness would be seriously bad news and my original post here shows I felt the same way. But it's pretty clear now (shoot me down in flames if anyone knows better) that an SSD declaring its full capacity of 256GB is exposing its overprovisioning to SMART, allowing all the bad sectors to be reported even though the OP mechanism is taking good care of them. What I still don't know (and would like to) is whether this applies to a 240GB SSD.

0

Reply

Score:0

Ubuntu

Chris Bidmead

12/8/22, 9:54 PM

There's been a response here that "Yes, the SSD needs to be replaced".

I'm in no position to deny this definitively. But on the evidence I have, I now believe this is not the best advice.

Since first posting this over a fortnight ago the system has been rock solid. I did have one instance of the Ubuntu internal error message I mention but the system rebooted without incident afterwards. And -- I believe, significantly -- the bad sector count has remained at 5439488 ever since this started.

So my working hypothesis from all this is: If Gnome Disks (or really SMART) says your drive is OK, it's OK. Don't be guided by an apparently high bad sector count. That just how SSD do things.

(I do think it would be -- er -- SMARTer if apps using SMART data were able to present the bad sector count as a wear percentage. But perhaps the over-provisioning total isn't accessible by SMART.)

--
Chris

0 + 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Disks reports "Disk is OK, 5439488 bad sectors"

TH: ดิสก์รายงานว่า "ดิสก์ใช้ได้ 5439488 เซกเตอร์เสีย"

RO: Disks raportează „Discul este OK, 5439488 sectoare defecte”

RU: Диски сообщает "Диск в порядке, 5439488 плохих секторов"

VI: Đĩa báo cáo "Đĩa vẫn ổn, 5439488 thành phần xấu"

Disks reports "Disk is OK, 5439488 bad sectors"

Post an answer