Score:1

Error in root flesystem (abridged output)

us flag

Today when I started my machine (HP Pavilion running Kubuntu 20.04) I got an immediate hard drive error. I somehow managed to work around that to gain acccess to the Net, enabling me to send this message. But something is clearly very wrong, almost certainly with the hardware. What can I do?

Immediately after I boot,I get these messages:

Hard Disk Error

Please run the Hard Disk Test System Diagnostics

Hard Disk 2 (3F2)

F2 System Diagnostics

Needless to say, I don't get anything helpful from the HP diagnostics. fsck refused to run because /dev/sda5 (the root filesystem) was mounted.

Running from a memory stick, I was able to do fsck on the root filesystem. It showed no errors, so the "Hard Disk Error" message was bogus -- a misleading indication of the real problem. Something in the BIOS settings, perhaps?

Here are the SMART results:

root@HP-Pavilion-Laptop-17-ar0xx:/home/pwa/Music# smartctl /dev/sda
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-84-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

ATA device successfully opened

Use 'smartctl -a' (or '-x') to print SMART (and more) information

root@HP-Pavilion-Laptop-17-ar0xx:/home/pwa/Music# smartctl -a /dev/sda
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-84-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Mobile HDD
Device Model:     ST1000LM035-1RK172
Serial Number:    ZDE4473L
LU WWN Device Id: 5 000c50 0a4965514
Firmware Version: RSM7
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Oct 18 14:54:09 2021 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x51) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 163) minutes.
SCT capabilities:              (0x303d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   084   064   006    Pre-fail  Always       -       239576664
  3 Spin_Up_Time            0x0023   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1297
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002f   086   060   045    Pre-fail  Always       -       415925788
  9 Power_On_Hours          0x0032   077   077   000    Old_age   Always       -       20304 (202 78 0)
 10 Spin_Retry_Count        0x0033   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1050
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   097    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       5
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   053   040    Old_age   Always       -       34 (Min/Max 29/35)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       106
192 Power-Off_Retract_Count 0x0022   100   100   000    Old_age   Always       -       233
193 Load_Cycle_Count        0x0032   098   098   000    Old_age   Always       -       5179
194 Temperature_Celsius     0x0022   034   047   000    Old_age   Always       -       34 (0 16 0 0 0)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     20264         -
# 2  Short offline       Aborted by host               90%     20263         -
# 3  Short offline       Aborted by host               50%     20263         -
# 4  Short offline       Completed without error       00%     20260         -
# 5  Short offline       Completed without error       00%     20260         -
# 6  Extended offline    Interrupted (host reset)      00%     20260         -
# 7  Short offline       Completed without error       00%     20259         -
# 8  Short offline       Completed without error       00%     20259         -
# 9  Short offline       Completed without error       00%     20259         -
#10  Short offline       Completed without error       00%     20258         -
#11  Short offline       Interrupted (host reset)      00%      2838         -
#12  Short offline       Completed without error       00%       482         -
#13  Short offline       Completed without error       00%         4         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

@heynnema - These voluminous outputs make it hard for me to find the relevant data.    But I can say that **grep -i FPDMA /var/log/syslog** produces no output. Looking at the SMART output I couldn't find the read/seek errors, but I believe you when you say that they're there.

My machine is a 2-year old HP Pavilion laptop, but parts of it, including the BIOS, are probably much older.

@heynnema - I've been trying to use pastebin but I.m caught in validate hell on the Ubuntu One website.  It's so frustrating to be trying to solve Problem A and to be dragged into Problem B.
heynnema avatar
ru flag
Can you articulate more about the error that you saw?
heynnema avatar
ru flag
Do the `fsck` in my answer. Post a screenshot of the `Disks` app **SMART Data & Tests** data window. Make the window large enough to catch all of the data.
heynnema avatar
ru flag
Edit your question and show me `grep -i FPDMA /var/log/syslog*`.
heynnema avatar
ru flag
You can't `fsck` a mounted partition. You must do it the way I instructed in my answer. If you're able to boot, get me the SMART screenshot.
heynnema avatar
ru flag
Thanks for the SMART data. Make sure to send me a comment that begins with @heynnema to let me know you have updated your question, else I miss the update all together.
heynnema avatar
ru flag
You have READ and SEEK errors on your drive. Edit your question and show me the FPDMA command that I requested earlier, and also `grep -i "/dev/sda" /var/log/syslog*`. If both/either of those command outputs is large, copy/paste the text at https://paste.ubuntu.com and give me the URL.
Score:2
dk flag

I'm not too sure if this will work but considering that your linux file system passes the test, perhaps run a SMART test in the disks app on Ubuntu (on LIVEUSB) on the FAT boot partition on your primary hard drive. Maybe it is an issue with that if you have it? My PC has a partition like that. Also check your linux (ext something) partiton again the error could be GRUB telling you something is wrong with that.

Also, I noticed that it is mounted at /dev/sda5 (my Ubuntu computer starts at /dev/sda1, 2 etc) and that the error is with Hard Disk 2 so maybe SMART check the other drives connected if you have any and try booting with the non-essential drives disconnected.

If you have REALLY important data on the drive, I it might be dying so don't use it again and take it to a data recovery place.

I used to have a faulty HDD that sometimes came up with a BIOS error and sometimes booted normally. It was dying, but I got to use it enough to get the data off before it died. Maybe if you can quickly use it and if you want to save a bit of money just quickly get the data off.

If you do have a backup, managed to get your data off or don't care about the data on your drive, just get a new one and swap them out.

Score:0
ru flag

Let's start with checking your file system...

  • boot to a Ubuntu Live DVD/USB in “Try Ubuntu” mode
  • open a terminal window by pressing Ctrl+Alt+T
  • type sudo fdisk -l
  • identify the /dev/sdXX device name for your "Linux Filesystem"
  • type sudo fsck -f /dev/sdXX, replacing sdXX with the number you found earlier
  • repeat the fsck command if there were errors
  • type reboot
us flag
The filesystem passes the fsck check. I think the problem lies somewhere in the system settings, possibly something to do with UEFI.
ru flag
@heynnema they posted the data.
heynnema avatar
ru flag
@ThomasWard Thanks for the heads up!
us flag
The problem solved itself --for the time being, anyway.
heynnema avatar
ru flag
@PaulA. The problem didn't solve itself. Please see my requests for further data, in comments to your question... FPDMA and syslog.
us flag
@heynnema - was the screenshot usable?
heynnema avatar
ru flag
@PaulA. The screenshot that you added to your answer (which you should probably delete) wasn't helpful, as you couldn't scroll the data window... but you added the similar data to your question, and that's how I know about the READ and SEEK errors. I'm still looking for `grep -i FPDMA /var/log/syslog*` and `grep -i "/dev/sda" /var/log/syslog*`. Edit that output into your question with copy/paste, or use https://paste.ubuntu.com and give me the URL.
Score:-1
us flag

Saints be praised!! I booted without getting the error, so at this point there's nothing to diagnose. Perhaps running smartctl helped things along.

My thanks to heynnema for astute assistance.

@heynnema - here is the screenshot: enter image description here

heynnema avatar
ru flag
The problem didn't solve itself. Please see my requests for further data, in comments to your question... FPDMA and syslog.
us flag
@heynnema - The syslog output you asked for is empty. SMART produces a lot of output but most of it is temperature changes. I'm inclined to think that the original error was caused by some kind of transient problem since I now can reboot without difficulty. Is this still worth pursuing, considering that I can't use pastebin because of validation problems with Launchpad andUbuntu One?
heynnema avatar
ru flag
Don't bother with pastebin then. Start `Disks`, go to the **SMART Data & Tests** data window, enlarge the window to show all data, get a screenshot, and add it to your question.
us flag
@heynnema - This is as frustrating to me as I imagine it is to you. t
heynnema avatar
ru flag
Not frustrating, but sometimes remote diagnosis can take time. Having to ask for the same info multiple times IS frustrating. This is my third request for... *"I'm still looking for `grep -i FPDMA /var/log/syslog*` and `grep -i "/dev/sda" /var/log/syslog*`"*... sorry if you supplied it, or I may have missed it.
us flag
I thought I answered this. I did the "grep -i FPDMA /var/log/syslog*" a while ago. It produced no output. and grep -i /dev/sda /var/log/syslog* produced only minor temperature changes, which is probably why you missed my report of the results.
heynnema avatar
ru flag
Sorry. Let's try a slight mod of the syslog command... `grep -i sda /var/log/syslog*` and see if we catch any fish.
us flag
@heynnema - Same result. Temperature changes only. I think there are no fish to catch. My guess is that the errors were transient and left no trace in syslog. If they were indeed transient, I wouldn't expect them to show up in SMART either. I believe that SMART doesn't reveal history, only the current state of the disk.
heynnema avatar
ru flag
Thanks for the update. SMART data is also historical. Let's try one last grep... `grep -i temp /var/log/syslog*`. (you're using the * at the end, yes?)
us flag
@heynnema - Since the system is now working, it would make sense that the errors were never recorded in the logfiles..
us flag
The only other thing I saw this time was two messages about cleanup of the temporary directory.
heynnema avatar
ru flag
That's why the * in my commands. It checks the syslog and syslog.1 files, where the historical data is. The fact is, your HDD is getting READ/SEEK errors... at least at some time in its history... maybe not any longer... but your recent boot issue indicates there's still something possibly wrong.
heynnema avatar
ru flag
What were the temps recorded in syslog*?
us flag
The temperature ranged from 33C to 41C. Seems innocent enough.
heynnema avatar
ru flag
Those temps are ok. Your computer isn't giving us much to go on. With READ/SEEK errors, I'd say "re-seat the SATA and power cables that go from the motherboard to the drive"... and yes... it's more difficult on a laptop. Otherwise your drive may just be going bad... keep good backups.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.