Score:1

Health Tests on NVMe

us flag

On the servers I have, with HDD or SSD, I have a cron that periodically runs:

/usr/sbin/smartctl --test=short/long /dev/sd1

(for each disk)

While it runs, it just looks at the output of /usr/sbin/smartctl -c /dev/sd1, looping until it no longer contains:

[0-9]+% of test remaining.

And then checks if it completed without errors:

(   0)  The previous self-test routine completed

However, it appears that smartctl doesn't yet support testing of NVMe, as of version 7.0, and as per: https://www.smartmontools.org/wiki/NVMe_Support

It does say that

The smartd daemon tracks health (-H), error count (-l error) and temperature (-W DIFF,INFO,CRIT)

but what does actually run the tests? I'm not sure if the output of -H and -l update unless we run short/long tests?

I also read about nvme-cli, but I don't seem to find ways of running health tests on disks with it.

Any ideas?

Using CentOS 7 here.

Marcus Müller avatar
pt flag
I don't *know*, but I would be surprised if running any explicit test would have a very large knowledge advantage for SSDs – these things are in a perfect position to track their own health, since wear leveling literally knows how often each memory segment has been used, *and* due to the comprehensive error-correction code inherent to NVMe devices, you get a very good picture of device aging simply from day-to-day usage.
Score:1
ca flag

SMART self-test were conceived for mechanical disks. SATA SSDs almost completely mirrors earlier HDD interface-level behavior supporting such self-test but not doing very much when you run it, actually. NVMe drives dropped such SMART self-test routines entirely.

For flash-based disks one should really track cells wear, spare block count and reallocated sectors rather then relying on old self-test routines which are not supported on NVMe drives.

Nuno avatar
us flag
Thank you very much. Makes sense. Do you know if I just leave `smartd` running, will it let me know of any NVMe disk problems though syslog messages? All I want is to rest assured that I'm covered, and not negligent :-)
shodanshok avatar
ca flag
As far I know, `smartd` should be capable of monitoring NVMe SSD health as well to alert in case the drive itself reports a non-healthy status.
Score:0
cn flag

Get the NVME test client installed

sudo apt install nvme-cli

Find the drive you want to check

nvme list
sudo nvme smart-log /dev/nvme0n1

There are some other self-test commands you can run with this command too, I believe these give the old short/long tests that smartctl did.

nvme device-self-test /dev/nvme0 -n 1 -s 1
nvme self-test-log /dev/nvme0n1
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.