Node fails to start. Unrecoverable error

gronis93 · April 16, 2026, 10:27am

My node recently had some io trouble with a disk. I resolved it but now storj wont start. I did run fsck on the file system and it seems fine. However I have trouble starting up storj as it runs into an unrecoverable error and just attempts to restart over and over again:

2026-04-16T10:15:08Z    ERROR   failure during run      {"process": "storagenode", "error": "Failed to create storage node peer: hashstore: put:{key:f862ffd61708ee74f142842853c750daab5d03bcaa6b88478df3c787304bc8d6 offset:306557056 log:4322 length:2174976 created:20485 (2026-02-01) expires:0 (1970-01-01) trash:false} != exist:{key:f862ffd61708ee74f142842853c750daab5d03bcaa6b88478df3c787304bc8d6 offset:349734272 log:2371 length:2174976 created:20485 (2026-02-01) expires:0 (1970-01-01) trash:false}: hashstore: collision detected\n\tstorj.io/storj/storagenode/hashstore.(*HashTbl).insertLocked:411\n\tstorj.io/storj/storagenode/hashstore.(*HashTblConstructor).Append:552\n\tstorj.io/storj/storagenode/hashstore.(*Store).reconcileLog.func3:1603\n\tstorj.io/storj/storagenode/hashstore.(*HashTbl).Range:327\n\tstorj.io/storj/storagenode/hashstore.(*Store).reconcileLog:1599\n\tstorj.io/storj/storagenode/hashstore.NewStore:374\n\tstorj.io/storj/storagenode/hashstore.New:110\n\tstorj.io/storj/storagenode/piecestore.(*HashStoreBackend).getDB:272\n\tstorj.io/storj/storagenode/piecestore.NewHashStoreBackend:122\n\tstorj.io/storj/storagenode.New:607\n\tmain.cmdRun:84\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.InitBeforeExecute.func1.2:389\n\tstorj.io/common/process.InitBeforeExecute.func1:407\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:115\n\tmain.main:34\n\truntime.main:285", "errorVerbose": "Failed to create storage node peer: hashstore: put:{key:f862ffd61708ee74f142842853c750daab5d03bcaa6b88478df3c787304bc8d6 offset:306557056 log:4322 length:2174976 created:20485 (2026-02-01) expires:0 (1970-01-01) trash:false} != exist:{key:f862ffd61708ee74f142842853c750daab5d03bcaa6b88478df3c787304bc8d6 offset:349734272 log:2371 length:2174976 created:20485 (2026-02-01) expires:0 (1970-01-01) trash:false}: hashstore: collision detected\n\tstorj.io/storj/storagenode/hashstore.(*HashTbl).insertLocked:411\n\tstorj.io/storj/storagenode/hashstore.(*HashTblConstructor).Append:552\n\tstorj.io/storj/storagenode/hashstore.(*Store).reconcileLog.func3:1603\n\tstorj.io/storj/storagenode/hashstore.(*HashTbl).Range:327\n\tstorj.io/storj/storagenode/hashstore.(*Store).reconcileLog:1599\n\tstorj.io/storj/storagenode/hashstore.NewStore:374\n\tstorj.io/storj/storagenode/hashstore.New:110\n\tstorj.io/storj/storagenode/piecestore.(*HashStoreBackend).getDB:272\n\tstorj.io/storj/storagenode/piecestore.NewHashStoreBackend:122\n\tstorj.io/storj/storagenode.New:607\n\tmain.cmdRun:84\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.InitBeforeExecute.func1.2:389\n\tstorj.io/common/process.InitBeforeExecute.func1:407\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:115\n\tmain.main:34\n\truntime.main:285\n\tmain.cmdRun:86\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.InitBeforeExecute.func1.2:389\n\tstorj.io/common/process.InitBeforeExecute.func1:407\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:115\n\tmain.main:34\n\truntime.main:285"}
2026-04-16T10:15:08Z    FATAL   Unrecoverable error     {"process": "storagenode", "error": "Failed to create storage node peer: hashstore: put:{key:f862ffd61708ee74f142842853c750daab5d03bcaa6b88478df3c787304bc8d6 offset:306557056 log:4322 length:2174976 created:20485 (2026-02-01) expires:0 (1970-01-01) trash:false} != exist:{key:f862ffd61708ee74f142842853c750daab5d03bcaa6b88478df3c787304bc8d6 offset:349734272 log:2371 length:2174976 created:20485 (2026-02-01) expires:0 (1970-01-01) trash:false}: hashstore: collision detected\n\tstorj.io/storj/storagenode/hashstore.(*HashTbl).insertLocked:411\n\tstorj.io/storj/storagenode/hashstore.(*HashTblConstructor).Append:552\n\tstorj.io/storj/storagenode/hashstore.(*Store).reconcileLog.func3:1603\n\tstorj.io/storj/storagenode/hashstore.(*HashTbl).Range:327\n\tstorj.io/storj/storagenode/hashstore.(*Store).reconcileLog:1599\n\tstorj.io/storj/storagenode/hashstore.NewStore:374\n\tstorj.io/storj/storagenode/hashstore.New:110\n\tstorj.io/storj/storagenode/piecestore.(*HashStoreBackend).getDB:272\n\tstorj.io/storj/storagenode/piecestore.NewHashStoreBackend:122\n\tstorj.io/storj/storagenode.New:607\n\tmain.cmdRun:84\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.InitBeforeExecute.func1.2:389\n\tstorj.io/common/process.InitBeforeExecute.func1:407\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:115\n\tmain.main:34\n\truntime.main:285", "errorVerbose": "Failed to create storage node peer: hashstore: put:{key:f862ffd61708ee74f142842853c750daab5d03bcaa6b88478df3c787304bc8d6 offset:306557056 log:4322 length:2174976 created:20485 (2026-02-01) expires:0 (1970-01-01) trash:false} != exist:{key:f862ffd61708ee74f142842853c750daab5d03bcaa6b88478df3c787304bc8d6 offset:349734272 log:2371 length:2174976 created:20485 (2026-02-01) expires:0 (1970-01-01) trash:false}: hashstore: collision detected\n\tstorj.io/storj/storagenode/hashstore.(*HashTbl).insertLocked:411\n\tstorj.io/storj/storagenode/hashstore.(*HashTblConstructor).Append:552\n\tstorj.io/storj/storagenode/hashstore.(*Store).reconcileLog.func3:1603\n\tstorj.io/storj/storagenode/hashstore.(*HashTbl).Range:327\n\tstorj.io/storj/storagenode/hashstore.(*Store).reconcileLog:1599\n\tstorj.io/storj/storagenode/hashstore.NewStore:374\n\tstorj.io/storj/storagenode/hashstore.New:110\n\tstorj.io/storj/storagenode/piecestore.(*HashStoreBackend).getDB:272\n\tstorj.io/storj/storagenode/piecestore.NewHashStoreBackend:122\n\tstorj.io/storj/storagenode.New:607\n\tmain.cmdRun:84\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.InitBeforeExecute.func1.2:389\n\tstorj.io/common/process.InitBeforeExecute.func1:407\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:115\n\tmain.main:34\n\truntime.main:285\n\tmain.cmdRun:86\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.InitBeforeExecute.func1.2:389\n\tstorj.io/common/process.InitBeforeExecute.func1:407\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:115\n\tmain.main:34\n\truntime.main:285"}

How can I resolve this? I’m trying to find out how to troubleshoot this but I seem to have trouble finding a good resource.

alpharabbit · April 16, 2026, 2:20pm

Maybe a damaged hashtable. I would try renewing it using the write-hashtbl tool.

gronis93 · April 16, 2026, 2:34pm

How do I do that? Im running storj inside the official docker containers. Not sure how I execute tools from inside there.

gronis93 · April 16, 2026, 2:42pm

Do I have to build it myself? Does storj not seriously ship with tooling for repairing data :s

alpharabbit · April 16, 2026, 4:35pm

AFAIK yes. Install golang and run:

go install storj.io/storj/cmd/write-hashtbl@latest

gronis93 · April 16, 2026, 7:40pm

Thanks! I managed to get the tool running. I’ll see if it fixes the problems.

However, I find it really terrible with the lack of documentation and support for stuff like this. It should be much easier for a project like storj to provide troubleshooting and potential fixes. Having to jump between different forum threads and build the software yourself and figure out which folder you need to point the tool to. Its not rocket science to provide some basic documentation for this. SNOs will leave the project with this much friction.

They should minimally have a document with things to try, a prebuilt docker image with the different tools, and how to use them.

It actually made me rethink storj as a project all-together. With this little attention for basic documentation to help out SNOs, are they really trying anymore? Is the distributed storage dream dead?

arrogantrabbit · April 16, 2026, 9:10pm

This is what agentic ai tools are for.

Lol. Sno will leave :). Have you noticed an obnoxious surplus of node operators? I little friction is a good thing. 3-5% annual payout cuts (inflation) also seem to have no effect. I’m sure SNOs will stay even if they have to pay to participate. There are a few justifiable reasons to do so.

Also, again, see above. Why are you wasting time DYIng your way around not-your-bugs when it’s 2026 outside.

And the real issue here is not this. It’s the fact that you allowed data to get corrupted in the first place. Focus on fixing that. You are effectively complaining about a bumpy ambulance ride after driving into a wall.

Alexey · April 17, 2026, 3:11am

Hello @gronis93,
Welcome back!

Did you figure out the reason of the data corruption? The node itself can check and fix some errors on startup, however not the data corruption. For that there is a write-hashtbl tool.

gronis93 · April 17, 2026, 9:08pm

Sure, I could use AI tools but I don’t trust them enough to let it do stuff on my server.

I still think the documentation should be improved. Its not that I had trouble getting it working, its more what it signals to the SNOs, how much Storj care about the project and their SNOs.

Im NOT complaining about my own experience. I knew the way I have things setup can result in data corruption if there are hardware problems. I might be able to save it, or I have to start this node over. I’m trying to be helpful to the community and addressing something I find lacking with the project to improve the SNOs experience. That is all.

Its a good thing though that we have this forum. They provide good help here, like Alexey.

The disk did face issues again after attempting to run write-hashtbl, so I’m probably having a dying disk.

Alexey · April 18, 2026, 3:38am

Please suggest your changes here: Pull requests · storj/docs · GitHub or here: Node Operators > FAQ with tags hashstore and write-hashtbl if these Search results for 'hashstore: collision detected tags:hashstore,write-hashtbl' - Storj Community Forum (official) didn’t help, or not enough.

We adding some FAQ articles to the documentation after they are proved on the forum: Frequently Asked Questions - Storj Docs, but I believe that the forum could be a better source.

gronis93 · April 20, 2026, 4:42pm

Ok. Update.

I finally detected the true reason for my disk drive errors, which was a bad power cable connection. I noticed this as more disks suddenly started to drop out after investigation started. The main 24-pin power cable from my PSU had come loose slightly, causing under-voltage from insufficient power transmission. That has now been resolved. However, there are probably some unrecoverable data errors on this storj node.

write-hashtbl exits with a hash collision. This is the stack-trace:

hashstore: put:{key:f93ce19995b506a0dedb1b032ced9d35efd49a10c96cccbd1329a9b97067e19b offset:1031037504 log:4615 length:8192 created:20475 (2026-01-22) expires:0 (1970-01-01) trash:false} != exist:{key:f93ce19995b506a0dedb1b032ced9d35efd49a10c96cccbd1329a9b97067e19b offset:1031045760 log:4615 length:8192 created:20475 (2026-01-22) expires:0 (1970-01-01) trash:false}: hashstore: collision detected
        storj.io/storj/storagenode/hashstore.(*HashTbl).insertLocked:411
        storj.io/storj/storagenode/hashstore.(*HashTblConstructor).Append:552
        main.(*cmdRoot).Execute.func2:111
        main.(*cmdRoot).iterateRecords:211
        main.(*cmdRoot).Execute:110
        github.com/zeebo/clingy.(*Environment).dispatchDesc:129
        github.com/zeebo/clingy.Environment.Run:41
        main.main:30
        runtime.main:285

Is there any way to recover from this, or is the data corruption too severe, and do I have to start this node over?

arrogantrabbit · April 20, 2026, 6:43pm

24 pin cable goes to MLB. Not drives. How did you confirm that brownouts were the cause of your issue? And not controller resets, or what not? If there were brownouts on ATX supply you would get system resets, not file corruption or drive disconnects. Or are you referring to those garbage “modular” power supplies with another set of gimmicky connectors on themselves instead of soldered cables coming out form a large opening or a PCB tooth?

You are making a lot of claims without evidence. For all you know issue may be still there. I would replace power supply at this point - because you don’t actually know that’s the issue is, only that it may be related to power delivery. Especially since your supply allows the cables to “come loose”. This is not normal and should not be ignored.

On your questions: leave it be. What’s corrupted — corrupted. The rest works.

If you suspect the hashstore itself is corrupted as opposed to logs — yes, you can rebuild it. Script to make write-hashtbl easier to use

gronis93 · April 20, 2026, 7:46pm

Why are you so angry?

I thought this information was unnecessary and the hardware is built very much non-standard but I guess here we go:

The power supply is a small modular 250W power supply with DIY power cables. The 12V adn 5V lines of the 24-pin connector goes to the drives. The whole system is very small and custom since I didn’t have the space for a full size ATX power supply motherboard etc when it was built. Its a small AIO pcb computer. The computer uses a single 12V line from one of the 12V modular connector for power and not the 24-pin connector.

The cables were a tight fit and after I moved, I guess one of the 12V modular connector pushed the main cable loose after being a bit too tight, because the lock clip was pushed open and the cable was 1mm pushed out on one side. I trimmed 1mm of plastic from the 12V connector that was interfering with the main ATX 24 pin cable to fit better next to each other and now they fit much better than before without pressing against each other, and the clip of the 24pin connector clip down properly. So I guess its was a poor oversight when I built the system like 4 years ago.

Regarding the evidence, I started to notice the problems after hearing the disk click from time to time (hard reset of the read write arm), which happens with power stuggles (amongst other things). It worked better at first was that I moved the disk to a different slot, but only for a day or so then the disk started clicking again. I did a few tests where I removed the disk completely but now my other disks wouldn’t connect to the HBA. The more I fiddled the more things started to break. So I suspected maybe there is a poor cable that becomes more and more problematic as I nudge and fiddle with the hardware.

When I ejected all disks to take a proper look at the system, I saw the 24-pin cable and suspected it immediately. On the workbench, it fell out of place by itself when I turned the system over a few times while working on it.

After connecting it properly I haven’t seen any log errors in dmesg for 2 days and the other disks that started having trouble after I started troubleshooting works now as well. My zfs pool which had a disk that failed to start during the last testing before the complete disassembly (not part of storj) resilvered automatically after the power cable was fixed and haven’t had any trouble since. I’m continuing to monitor the system log for link and ata related errors.

So, that is the evidence I guess.

I’ll take a look at the script. Thanks

arrogantrabbit · April 20, 2026, 7:57pm

I’m not angry, I’m asking for details because I don’t’ like lingering mysteries. Call it professional deformation if you will, can’t leave any stones unturned, that’s literally my day job, I’m being paid to be that way at work – so there is definitely no anger or frustration.

This is a very satisfying explanation, that not only explains the failure, but also provides some faith in the fix: the detent is present and will prevent this from happening again.

Thank you for providing details. Those are not just helpful for me, because I’m an annoying asshole, but for anyone else who might google “disk clicking lost data corrupted files nothing in system logs” and find “unbuckled power cable” and have a “doh!” moment and go recheck the connections.

gronis93 · April 20, 2026, 8:15pm

Ok, I see. I guess I misinterpreted emotion from your writing incorrectly. Sorry for that!

I guess your response caused me to write out the whole story so it was useful in the end. I didn’t consider how useful the context would be for others. Now let’s see if I can figure out write-hashtbl.

snorkel · April 20, 2026, 9:05pm

He is always angry because he is a cat who belives himself/herself being a rabbit, so the cofusion drives him/her to the edge.

Alexey · April 21, 2026, 3:56am

You can use the -f (--fast) option, which will skip these fragments and leave them as is without registering them in the hashtbl. They should ultimately be removed during compaction. Of course, these fragments will be considered lost if they are requested for an audit, which will directly affect the audit score. If you lose more than 4%, this satellite will disqualify your node.

gronis93 · April 22, 2026, 11:25am

Thanks for the help everyone. I finally managed to get the node to start again after rebuilding the hash table. Hopefully I haven’t been offline for long enough or lost too much data to get disqualified. Time will tell.

Awful_Charlie · April 23, 2026, 6:33am

<deleted, and padded to more than 20 characters>

Alexey · April 23, 2026, 7:02am

Please, stop offtopic, thanks. Accordingly our rules I should remove these unrelated comments.