Using an hdd with bad sectors

penfold · October 13, 2021, 5:40pm

In my case I have combined it with an SSD as the boot device that also appears to be flaky so it will be double the fun. lol.

I’m running the fsck -vcck /dev/sdb1 at the moment as suggested by @Doom4535 above and so far it is about 20% of the way through with 0 errors. Smartmontools did report problems with the drive so I’m not expecting that sate of affairs to last. lol.

penfold · October 13, 2021, 5:52pm

The strangest drive issue I had was an ancient 20MB (Yes, I am that old!) Seagate MFM drive where the write head was busted but the read head still worked. You could delete everything, even run a format or fdisk, reboot the machine and it would all be back.

andrew2.hart · October 13, 2021, 6:52pm

I don’t think you should use a disk that you know to be bad. I don’t think you should do this but if you write zeros to the whole disk the bad blocks will be re-mapped into a spare area.
You could also create a partition avoiding the bad areas (they tend to cluster together) and then “volume” the bits together with your favorite os/tool

BrightSilence · October 13, 2021, 9:32pm

Bad sectors aren’t that bad in many cases. I have an HDD that was rejected by my Drobo in an external USB case running a node for 2.5 years now. It had a single bad sector once. I would have still been using it in the Drobo if I could. But that device just registers the serial as failed and won’t ever accept it again. The HDD is perfectly fine and hasn’t missed a beat since. I think running a store node is one of the better things to do with such an HDD. The customer data is protected anyway and you might as well put it to some good use even if you wouldn’t entrust your own data to it anymore.

Of course it could also be the first sign of a dying drive, but if that’s the case you’ll find out soon enough.

zeroheat · October 16, 2021, 6:13am

Considering the latest payouts … why would anyone start with StorJat all? No matter with a new hdd or used with bad sectors?

andrew2.hart · October 16, 2021, 7:53am

We are all whales, so get paid pretty well

penfold · October 16, 2021, 9:11am

Because the other alternative would be for me to just toss it in the trash.

penfold · October 17, 2021, 8:28am

The trash disk node came up today so now we get to see if it will survive vetting. I have a total of 5TB available across two internet links so I doubt I will have any need to add further storage this year. But, I’m already considering what will become node 4 in the future.

Pentium100 · October 18, 2021, 6:26am

I would not use a disk with bad sectors for Storj, unless it was part of a mirror with a good disk (or if I only had disks with bad sectors, then three disk mirror). The node needs to survive for a few years before it becomes “worth it” and I would doubt that the drive would survive that long.

penfold · January 5, 2024, 2:27am

In my case doing this has been very worthwhile. The drive is still going and hasn’t added any additional bad blocks. Indeed, the host the node runs on has been upgraded to something a bit faster with more I/O and still no problems have arisen so far. This node has now generated $148 in income. The last error was at 10,500 power on hours and the drive is about to hit 30,000. All from a drive I would otherwise retire from use.

Ruskiem · January 5, 2024, 3:05am

Ahhh the magic of STORJ! Who else can do such things? without risk of losing any data!

Edit: This is the way!
Using a disk, that other wise wouldn’t even be safe to use!
Saving the environment!

aseegy · March 31, 2024, 5:13pm

Hello everyone!

I have just finished scanning a hard drive with Victoria SSD/HDD and the results are the following:

12:40:12 : Warning! Block start at 2841147392 (1,5 TB)  = 1407 ms
12:40:14 : Warning! Block start at 2841149440 (1,5 TB)  = 2797 ms
12:40:16 : Warning! Block start at 2841151488 (1,5 TB)  = 1563 ms
12:40:18 : Warning! Block start at 2841153536 (1,5 TB)  = 1781 ms
12:40:27 : Warning! Block start at 2841204736 (1,5 TB)  = 1109 ms
12:40:29 : Warning! Block start at 2841206784 (1,5 TB)  = 1719 ms
12:40:32 : Warning! Block start at 2841212928 (1,5 TB)  = 2141 ms
12:40:33 : Warning! Block start at 2841214976 (1,5 TB)  = 1125 ms
12:40:35 : Warning! Block start at 2841219072 (1,5 TB)  = 1172 ms
12:40:36 : Warning! Block start at 2841221120 (1,5 TB)  = 1219 ms
12:40:39 : Warning! Block start at 2841229312 (1,5 TB)  = 1407 ms
12:40:58 : Warning! Block start at 2843064320 (1,5 TB)  = 1234 ms
12:41:05 : Warning! Block start at 2843162624 (1,5 TB)  = 1203 ms
14:53:04 : *** Scan results: Warnings - 60, errors - 0. Last block at 3907024887 (2,0 TB), time 6 hours 29 minutes 30 seconds.

Print screen:

Do you suggest I try the remap function or should I just leave it as it is? Where for “remap” Victoria means the following:

This is the mode for replacing damaged or faulty sectors with working sectors from the backup area of the hard disk.

Thanks for any suggestions.

Mitsos · March 31, 2024, 6:18pm

We first need to define “bad sectors”.

The following is the process that the disk actually uses (all sector numbers are examples for this scenario):

It tries to read sectors 1 - 100
It gets a read error on sectors 23, 47 and 59.
It marks sectors 23, 47 and 59 as current pending sectors. This is S.M.A.R.T. attribute 197 Current_Pending_Sector increasing by 3.
Depending on the drive’s firmware settings, if the drive is a “normal” drive: it will likely not have any error control recovery (ie TLER, SCT) which says “if I encounter a read or write error, I have to report back an unrecoverable sector within 7 seconds” (ie 240 read retries of a sector, spinning disk, passes that many times under the head in 7 seconds). If it is a “nas” or “server” disk it will have those timeouts, even if disabled by default and can be enabled. Normal drives will hang at trying to read the same bad sector forever (where forever=enough time, usually 28 seconds on Linux for the disk subsystem to notice that there is something wrong with the drive and send a reset command). nas or server drives MUST report that error based on their error control timeouts. If you run those drives in an array, the entire array’s corresponding sectors are re-read to return the correct data that the bad sector can’t return. You can’t have the array hang while this happens, hence the timeout. If a sector is successfully read during all of this, they are removed from the count, so smart id 197 becomes 2.
where was I… ah yes: If a write request comes for sectors 23, 47 or 59, the drive knows that those sectors are faulty. It will re-map one of the sectors using its spare allocation. This will increase smart attribute 5 Reallocated_Sector_Ct ( if all three are re-mapped, this will show 3) as well as smart attribute 196 Reallocated_Event_Count (if all three are re-mapped at once, this will show 1. If the drive remaps one today it will show 1, if it remaps one tomorrow it will increase to 2 and so on). A remapped sector is essentially a “new” sector.

As you can see, this is the actual process for dealing with bad sectors by the drive itself. In order for me to better help, I need to know what those bad sectors are identified as. Are they currently pending or reallocated (remapped)? FWIW I have drives with 53,000 power on hours and they just started showing pending sectors. I also have brand new drives that have 891 reallocated sectors during the first write pass on them. Both categories are currently in operation.

If a drive starts showing pending sectors after a looong time, this is usually down to the “pressure equalization valve” (a piece of cotton behind one of those “do not cover holes”). Those filters disintegrate, which lets dust get into the drive and this results in the bad sectors. There is the case for bad media of course, but this is usually shown with the immediate bad sectors (ie the 891 drive above).

Ruskiem · March 31, 2024, 7:08pm

wait what?> a dust to get into drive? there are this helio sealed disks from a decade now, on the market, without a hole. omg, the hole i almost forget it existed
i have 2 such disks(4TB), omg yeas, they go bad faaast…

Mitsos · March 31, 2024, 7:34pm

Not all drives have been converted to He. He is notoriously difficult to keep contained (its size is so small, it can leak through almost anything). The first thing they had to get right was laser welding the disk enclosure airtight to actually keep the He inside. He level is shown by smart attribute 22 Helium_Level.

I don’t think most SNOs are actually running He drives, which is why I gave the (main) reason of dust getting into the drive. It could also be bad media, bad head servo(s, in case of triple actuated for example), or just plain firmware corruption (the drive thinks that sector is at x,y coordinates and lines up the head with this, but the sector is actually at a,y coordinates. There are disk shift tables in the drive to keep things properly running. The media shifting (ie worn out bearings causing a wobble effect) does count when you are talking about the microscopic sectors on today’s drives.

We need the bad sector identification. Without knowing what the drive thinks about them, we can’t suggest if it is likely to blow up tomorrow or 5 years down the line.

aseegy · March 31, 2024, 9:28pm

Hello @Mitsos,

I am not able to answer your question, I am sorry. How can I tell what the drive thinks about the aforementioned bad sectors?

Thanks!

aseegy · March 31, 2024, 9:45pm

Maybe this helps

If I understand correctly C5 Current Pending Sector Count=7

Power on hours = 97’992

Still this is just by reading SMART figures. Do you need more detailed and trustworthy information?

Mitsos · March 31, 2024, 10:06pm

That’s what I was looking for. Drive looks good, a write pass on it (if it doesn’t have any data you want to keep) should fix it.

If you want to be sure: 1 read pass (ie a full smart scan), 1 write pass, then a full scan again.

If you only see a few sectors being reallocated (I’m guessing it will find a few more on the first pass, then reallocate them on the write pass), then the drive is good. NOTE TO READERS: THE FOLLOWING APPLIES ONLY TO THIS PARTICULAR DRIVE: Use a piece of kapton tape to tape over any visible drive holes on the drive top (the side with the sticker). You might be able to save it for a few more years.

If the pending sectors start rapidly increasing (rapidly increasing= one day they are 7, next they are 40, next they are 200), the drive is done for.

All in all I don’t see anything badly wrong with the drive (yes the motor is getting a bit weak, but as long as the drive keeps spinning, it shouldn’t matter). There is a slight read and write error rate (the value isn’t at the max, it dropped from 255) but it’s nowhere near the threshold (=minimum allowed before last column shows failed) which may mean a weak head, but I wouldn’t pay any attention to it for now. Also your cable looks good (no CRC errors).

I would give it another 5 years, assuming that it passes the aforementioned read/write/read tests with reasonable sectors.

aseegy · March 31, 2024, 10:09pm

Well it already has a node on it since the beginning of V3 or almost. I do not think that I can delete anything.

Thanks for your help, greatly appreciated.

Mitsos · March 31, 2024, 10:14pm

Warning: The following will cause file corruption!
In that case you could use a program to only re-write the damaged sectors which will re-allocate them if they are indeed damaged. I’m on Linux and I do it manually, so can’t recommend any program for windows.

What I would personally do: let the drive sort itself out after running a couple of full smart scans on it (it’s a small drive, you could stop the node on it just to be completely sure). 7 sectors would at max mean 7 damaged segments. I doubt the node would be disqualified for that (IF you get audited for them at all before they are trashed and deleted). Again, I’m assuming that the drive will not show 2000 pending sectors on the first smart scan.