Disk is dying, new node or try to save it?

HGPlays · January 18, 2026, 10:14am

Hello, I have a node with a disk that is clearly dying. It has about 2100 reallocated sectors, and I think it’s going up.
I recently restarted it, had to do some changes to the config file. And now it won’t start up. I get this error:

FATAL Unrecoverable error:
Failed to create storage node peer: hashstore: read
C:\Drives\storj26\data\hashstore<NODE_ID>\s0\meta\hashtbl-00000000000000a3:
Datafejl (cyklisk redundanstjek)

Stack trace (shortened):
storj.io/storj/storagenode/hashstore.(*roBigPageCache).ReadRecord
storj.io/storj/storagenode/hashstore.(*HashTbl).Range
storj.io/storj/storagenode/hashstore.OpenHashTbl
storj.io/storj/storagenode/hashstore.NewStore
storj.io/storj/storagenode.New

And I don’t know how to “fix” this, and get it back up and running.

I am considering just completely start over, it’s a relatively new node. But if any of you know if this error is fixable, saving the node could be nice.

Thanks in advance!

Roxor · January 18, 2026, 12:21pm

Search the forum for “write-hashtbl” threads. It’s a Storj repair utility that can go backwards from all the hashstore data files and recreate the contents of that s0\meta directory. It will probably take hours to run (as nodes often have a lot of hashstore data now)

(Edit: I just saw your latest 10x18TB build video: are you expanding for Storj?)

HGPlays · January 18, 2026, 4:00pm

Yes I have been expanding, although this is a 12tb drive that seems dying, has been running since summer thanks a lot for the tip, I will look into it

Toyoo · January 18, 2026, 5:02pm

Note that if the disk is dying, trying to repair its content may make things even worse. The usual procedure is migrate (using a tool specialized for data recovery, like ddrescue), only then repair.

And if the node is young, it’s a perfect opportunity to treat it as a lab case for learning, so that if it happens to more important data, you’ll be better prepared.

Roxor · January 18, 2026, 5:20pm

I don’t have logs/details to back this up: so treat this as a story…

Piecestore: When I had disk-health issues: most of the time you can image/ddrescue the disk to recover all-that-you-can… then fsck/scrub the fresh filesystem. You lose a few .sj1 files but your audit score barely changes and you go on with your life. Easy.

Hashstore: Twice now disk-health issues came with meta/hashstable read errors. So on the recoverd filesystem I still had to use write-hashtbl. And although that util ran successfully… in both cases within 24h of restarting the node it was disqualified (even though both started with 100% audit scores, and over 90% online scores). I don’t know what the satellite is asking for, so often and so quickly… that can kill a node in hours.

I just had a hashstore-recovered node get DQ’d ten minutes ago: and I started it up this morning.

What does this mean? I don’t know. Maybe in the hashstore cases the HDD damage really was so severe that large swaths of data was lost. But recovery only complained about perhaps a dozen files. But it feels like small disk errors can blow larger holes into hashstore data: or the audit process is somehow more sensitive in detecting them.

Toyoo · January 18, 2026, 5:32pm

Healthy hashstore packs metadata closer together, so a single faulty sector impacts more pieces. Unhealthy hashstore needs to recover from log files, which are inherently sequential—it’s difficult to recover everything that lies past a single error. And write-hashtbl is not magic: it cannot report on pieces it is not able to detect past that kind of problem. These two design decisions lead to more difficult recovery in case of bad hardware, so yes, your intuition is correct: a small disk error can impact many more pieces.

Roxor · January 18, 2026, 5:45pm

That makes me fear when Storj eventually forces hashstore-migration of old data. If today I can recover from 95% of errors… and hashstore may reduce that to 50%.

I’d rather lose some capacity to parity than risk losing 1-3 years of used-space.

alpharabbit · January 18, 2026, 6:25pm

As far as I understand the code it scans logfiles from the end and checks any byte boundary for a valid header. So it can handle logfiles with holes. I have used it several times successful with such files.

Toyoo · January 18, 2026, 6:31pm

This is just a heuristic. Decently successful, but is not 100% fault-proof. Like, if a single extent in the file is gone, many pieces may disappear.

alpharabbit · January 18, 2026, 6:44pm

Damaged pieces are lost anyway but with write-hashtbl the percentage we can recover is not much different to piecestore in my opinion.

BTW:
I have done several tests with ddrescue. Result after many hours was just the same as rsync. So I now just do rsync and write-hashtbl if needed.

Toyoo · January 18, 2026, 6:46pm

Again, this is assuming the log file is intact, and this requires file metadata to be correct. A single bad write to file metadata means losing a single piece in piecestore, but potentially many of them in hashstore.

alpharabbit · January 18, 2026, 6:56pm

I am using ZFS with no parity so any bad sector results in a data loss of zfs blocksize. All pieces covered by this block are lost no matter if using piecestore or hashstore. Anything else can be restored in both cases, no difference in my opinion.

Toyoo · January 18, 2026, 6:57pm

I can only guess, but I assume from the Windows paths in the original post that @HGPlays uses NTFS. I found this cute case of NTFS not recovering after a crash correctly, overwritting some random files.

alpharabbit · January 18, 2026, 7:15pm

I get your point but with piecestore we have thousands of subdirectories which also could disappear because of metadata errors…

Toyoo · January 18, 2026, 8:57pm

Yeah, but these are easier to recover in NTFS from MFT.

It all boils down to, in what characteristics do you trust your file system to behave reliably in presence of failures. Obviously this differs from implementation to implementation. And by design hashstore has different reliability needs from piecestore. Maybe—just maybe—ZFS matches the expected reliability profile better than NTFS…

Alexey · January 19, 2026, 3:25am

As far as I know, the hashstore backend was implemented with ext4 in mind.

cap · January 20, 2026, 2:07am

ZFS keeps redundant copies of metadata so a single lost sector may not lose anything or just one piece. If you lose a file that may be hundreds with hashstore but only one piece with piecestore. losing a directory makes my head hurt to think about so just say it is just bad with either.