How to determine HDD health with a reasonable degree of certainty?

aseegy · January 1, 2020, 5:58pm

Dear all,

I have received a few used HDDs for free which I am hoping to use for Storj. Still, as the drives are used, I would like evaluate the health status of the drives. From what I remember SMART values are too superficial and do not tell the whole story about a drive.

I am wondering if there is a “standard procedure” that we could convene on to define if a used HDD is suitable to be dedicated to Storj.

Thanks in advance and best wishes for the new year.

Vadim · January 1, 2020, 6:01pm

You can use HDD vendor software and make full serface test.
there is no more posible do more.

aseegy · January 1, 2020, 6:16pm

Hello @Vadim and thanks a lot for the prompt answer.

I will try to use Western Digital Data Lifeguard Tools on the drives I have received, since they all are of the same make.

Do you know of a linux distribution which could be launched from a live USB key in order to do the same? I believe that it is recommended to have the HDD not in use by third software while the scan is performed in order to avoid data loss.

Vadim · January 1, 2020, 6:26pm

i found only bootable usb version

nothing for linux itself

aseegy · January 1, 2020, 6:29pm

I have performed a quick research too and from what I understand an easy way of doing it could be to use any common live distribution and use the software “smartmontools” to make the tests. I was reading the details about the commands to be used on the following site: link.

In any case it could be useful to have a second computer to run all this tests on. If you have only one computer then you will have to turn off the node, which is sub-optimal.

stuu · January 1, 2020, 6:56pm

For maximum safety you can use RAID 1 or 5 depend on how many disk you have.

S0litiare · January 1, 2020, 8:24pm

Just get any software the can monitor “S.M.A.R.T.” interface.

In Linux, "smartmontools"is used to check the S.M.A.R.T. readout and if it finds an issue, email the local root or user account.

You can also use it to run the checks manually or in a scheduled cron job and get a status report emailed or saved in a text file (whatever is easier)

cdhowie · January 1, 2020, 8:26pm

It is generally better to run a storage node per disk instead of using RAID.

stuu · January 1, 2020, 9:06pm

I can’t agree with you. In my opinion it’s better to have RAID because if you lost your node you need to wait 15 months until Storj stop holding some of your money. Correct me if I’m wrong.

donald.m.motsinger · January 1, 2020, 10:41pm

If you lose your node, you lose all held money in escrow. Yet is still more profitable to run 1 node per hdd statistically.

cdhowie · January 2, 2020, 12:37am

In general, yes, you are wrong. I linked the wrong post in that thread in my reply. Please read this post, starting with the line “these problems are really only problems if you expect your disks to fail frequently” after the third quote.

Basically, you need to balance the lost escrow with the additional revenue potential of doubling your capacity. RAID1 gives you a revenue stream that is smaller but fluctuates less. Running a node per HDD will give you more revenue in the long run but you may see dips when a node fails.

If you expect your drives to last at least a year, RAID will cost you revenue.

aseegy · January 2, 2020, 6:35am

Dear all,

if possible I would ask you to stay on topic.

Thanks.

node1 · January 2, 2020, 7:11am

Hi,

Check this tool: https://hddscan.com/

It’s good surface test utility. You can see not only good/bad sectors, but as well graph of the reading speed whole surface.

Afterwards use smartmomtools as described before in this topic.
While i suggest raid6 as well

BrightSilence · January 2, 2020, 8:17am

Spinrite is pretty good. It monitors SMART stats while stressing the drive and can even in some cases recover data from unreadable sectors.

It’s not exactly user friendly and requires a windows PC to create the bootable medium.

aseegy · January 4, 2020, 9:29pm

Dear all,

thanks a lot for your feedbacks. In the past two days I have tested two discs using Western Digital utility “Data Lifeguard Diagnostic for Windows”. One of two disks reports an error.

20200103_Disk3_QuickTestFail

After the simple test I proceeded with the extended one:

20200103_Disk3_BadSectorsFound

Of course it is not possible to repair the bad sector.

20200103_Disk3_NotPossibleToRepair

The simplest way to deal with the issue seemed to me to use chkdsk:

Microsoft Windows [Version 10.0.17763.914]
(c) 2018 Microsoft Corporation. All rights reserved.

C:\Windows\system32>chkdsk E: /f /r
The type of the file system is NTFS.
Volume label is Data 2.

Stage 1: Examining basic file system structure ...
  256 file records processed.
File verification completed.
  0 large file records processed.
  0 bad file records processed.

Stage 2: Examining file name linkage ...
  1 reparse records processed.
  278 index entries processed.
Index verification completed.
  0 unindexed files scanned.
  0 unindexed files recovered to lost and found.
  1 reparse records processed.

Stage 3: Examining security descriptors ...
Security descriptor verification completed.
  11 data files processed.

Stage 4: Looking for bad clusters in user file data ...
  240 files processed.
File data verification completed.

Stage 5: Looking for bad, free clusters ...
  488328439 free clusters processed.
Free space verification is complete.
Adding 112 bad clusters to the Bad Clusters File.

Windows has made corrections to the file system.
No further action is required.

1953512447 KB total disk space.
     72788 KB in 8 files.
        76 KB in 13 indexes.
       448 KB in bad sectors.
    125823 KB in use by the system.
     65536 KB occupied by the log file.
1953313312 KB available on disk.

      4096 bytes in each allocation unit.
 488378111 total allocation units on disk.
 488328328 allocation units available on disk.

C:\Windows\system32>

Just for information: each test took ca 4:30 hours for a 2TB drive.

What I learned is that chkdsk does not repair the bad sectors but it flags them in order NOT to write on them.

Adding 112 bad clusters to the Bad Clusters File.

Since I did not have any data on the disk for me, flagging the sectors as “bad”, is enough: for other situations, where there is important data on the bad sector chkdsk will not be enough as it is not capable of recovering data. One software which may be able is the one mentioned by @node1 (thanks!) or Victoria. I did not use any of the two.

Now what I was wondering is if it makes sense to repeat such tests every now and then. If so with what frequency should I run such a check? Since it is quite expensive in terms of time, node reputation and hardware I would limit such a check at two times per year. Does such a reasoning make sense?

BrightSilence · January 4, 2020, 9:51pm

I have scheduled quick SMART tests every month as well as extended SMART tests every 3 months. Synology lets you easily schedule those and they can run while the array is online, so no impact on down time.

Modern disks tend to handle bad sectors internally and reassign them. If bad sectors are actually showing up in surface scans, it’s a sign that your HDD is not having a good time. You may be looking at a dying drive here. Simply blocking out bad sectors does nothing to prevent further degradation which is almost certainly going to happen. Take the appropriate steps to back up data on it. Might want to check your warranty and RMA if it’s still within warranty.

aseegy · January 4, 2020, 10:00pm

Ok. It is and old drive and this is probably one of the reasons why I received it for free.

I will remove it and throw it into the bin.

No data on it so no biggie.

leescotti52 · January 6, 2020, 11:11am

Hello Aseegy, There are multiple ways where you can check the drive health. Some of them listed below.
1 Check the HDD Manufacturer Tools: Most of the major HDD manufacturer provide the free tools to check the hard drive health. To do so, go the disk-manager > Unstack the ‘disk drives”. Here you will find the model number of your hard drive > Go to manufacturer support page> find utility which support your HDD.

2 Windows CHKDSK Tool: It’s a inbuilt utility of windows that will scan your disk to find system error and bad sectors on hard drive. To do so, Right click on HDD > Go To Properties > Click on Tool Tab > Click on Check Now > Check boxes> Start Now. It’s a basic tool to find out your hard drive health.

3 USE WMIC: It’s a command line interface that helps you to perform many admin tasks including check the hard drive health. It use S.M.A.R.T features of hard drive to see status and provide conclusion.

Alternative Solution: The alternative solution is use third party utility like Stellar Data Recovery Professional. It has inbuilt functionality called “Drive monitor” which helps you to check the Hard drive health and find bad sectors in appropriate ways. It can also recover you lost data which is formatted, deleted etc.

Hakanx · September 5, 2022, 5:14am

Hello. How to Check Hard Drive Health on the Internet? I found your forum while researching. I’m having some problems with the hard drive. Can you help me.

Alexey · September 5, 2022, 7:07am

Hello @Hakanx ,
Welcome to the forum!

You need to check your hard disk locally with commands related to your OS. For Linux it will be fsck in the terminal with root rights, for windows it will be chkdsk from the elevated PowerShell or command line (cmd.exe).
Your storagenode container (service) should be stopped before check. In Linux you also need to unmount disk for check.