Hashstore error preventing node restart

Looks like the machine behind my nodes had a power failure recently. 4/5 nodes came up fine. The last one is failing to start up, it logs successfully starting up two satellites for hashstore then logs this and aborts:

failure during run      {"Process": "storagenode", "error": "Failed to create storage node peer: hashstore: logSlots calculation mismatch: size=34603008 logSlots=19\n\tstorj.io/storj/storagenode/hashstore.OpenHashtbl:116\n\tstorj.io/storj/storagenode/hashstore.OpenTable:121\n\tstorj.io/storj/storagenode/hashstore.NewStore:258\n\tstorj.io/storj/storagenode/hashstore.New:93\n\tstorj.io/storj/storagenode/piecestore.(*HashStoreBackend).getDB:248\n\tstorj.io/storj/storagenode/piecestore.NewHashStoreBackend:114\n\tstorj.io/storj/storagenode.New:598\n\tmain.cmdRun:84\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:272", "errorVerbose": "Failed to create storage node peer: hashstore: logSlots calculation mismatch: size=34603008 logSlots=19\n\tstorj.io/storj/storagenode/hashstore.OpenHashtbl:116\n\tstorj.io/storj/storagenode/hashstore.OpenTable:121\n\tstorj.io/storj/storagenode/hashstore.NewStore:258\n\tstorj.io/storj/storagenode/hashstore.New:93\n\tstorj.io/storj/storagenode/piecestore.(*HashStoreBackend).getDB:248\n\tstorj.io/storj/storagenode/piecestore.NewHashStoreBackend:114\n\tstorj.io/storj/storagenode.New:598\n\tmain.cmdRun:84\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:272\n\tmain.cmdRun:86\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:272"}

I use ZFS for the filesystem under all of my nodes, but ironically this is the only one that’s redundant - the others I just use it for consistency in system config, but 1 drive at a time. From ZFS’ perspective, there are no data errors.

This node appears to be running v1.126.2.

Hello @bryanpendleton,
Welcome to the forum!

Seems your hashstore has been corrupted.
I would share this with the team

Since this doesn’t seem to be a problem with all of the node’s data but just the one satellite, yet the error is causing the node to refuse to start up at all, is there any suggestion for how to bring the node up to serve the satellites it doesn’t have corrupted data for?

Put the one satellite on the exclusion list in config and restart. Wait for an official response though. I don’t know if it will disable that sat permanently or temporary.

For ex. to disable the Saltlake sat, put this in config, or edit the existing line:

# list of trust exclusions
storage2.trust.exclusions: "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE@saltlake.tardigrade.io:7777"
1 Like

Any updates here? I’m not even sure which satellite is the one that’s failing.

Unfortunately not. I also do not know is there a workaround.

Does this mean that a node using hashstore can die because of an error in a single file???

That’s what tech preview means.

1 Like

Unfortunately yes.

However, a ZFS system with CoW (copy-on-write) should lower the risk quite significantly - and perhaps that is why only 1 one with 1 sat is experiencing this issue. Imagine how the situation for @bryanpendleton would be if this was EXT4 partitions with Hashstore :scream:

It has been raised as a concern many times, and extensively discussed in the hashstore main thread here [Tech Preview] Hashstore backend for storage nodes

Most recommendations so far goes in the direction that a UPS is a very smart addition to your setup, in case you opt in for hashstore.

It has also been discussed, that a repair / rebuild tool perhaps would see the light of day in the future (this is not yet announced by StorJ)

It has been merged. You can build it with latest go:

git clone git@github.com:storj/storj.git && cd storj
go install ./cmd/write-hashtbl

Then in the ~/bin subfolder there should be a binary write-hashtbl.
You may see the help

~/bin/write-hashtbl --help

the easiest way is to just use the default flags and pass it one of the store directories, so like

~/bin/write-hashtbl /mnt/storj/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s0

and do the same with /s1

You may also use docker to build the local image with the binary to do not install all developers tools, like it is described there (you need to replace the command, of course, to build this tool, not benchmarks):

Please note, the tool would place the generated hashtables in the current directory, so you would need to move them to the proper folder, or you may start this command from the proper directory as well, i.e.

cd /mnt/storj/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s0
mv meta meta.bak
mkdir meta
cd meta
~/bin/write-hashtbl /mnt/storj/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s0
5 Likes

For anyone following along, I did the rebuild steps a month ago and my node was able to come back online and has been working since.

5 Likes

How much data was in it and how long did it take to build?

I would like to know more about the procedure, have instructions and a ready-made solution for Windows - is there one?

Nobody has submit such PR so far, I would like to invite you to be a first one :slight_smile:

Right now you can build this tool by installing GO for Windows, then use

then run like described in the same post

cd X:/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s0
mv meta meta.bak
mkdir meta
cd meta
~/bin/write-hashtbl X:/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s0

But maybe easier to use docker approach mentioned earlier. It doesn’t require to install all developers tools to build this utility:

  1. Create a Dockerfile
FROM golang as build
RUN git clone https://github.com/storj/storj.git && \
    cd storj && \
    go install ./cmd/write-hashtbl

FROM ubuntu
WORKDIR /meta
COPY --from=build go/bin/write-hashtbl /usr/bin/
  1. build:
docker build . -t storj-write-hashtbl
  1. now restore (PowerShell) s0:
cd X:/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s0
mv meta meta.bak
mkdir meta

docker run -it --rm -v ${PWD}:/hashstore -v ${PWD}/meta:/meta storj-write-hashtbl write-hashtbl /hashstore
  1. do the same for s1 (PowerShell)
cd X:/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s1
mv meta meta.bak
mkdir meta

docker run -it --rm -v ${PWD}:/hashstore -v ${PWD}/meta:/meta storj-write-hashtbl write-hashtbl /hashstore
1 Like

I think the satellite that was broken had a couple of hundred GiB, but I’m not sure. The rebuild took quite a few hours, but less than a day IIRC.

2 Likes

Hello. I need help.
I had a ZFS crash and the hashstore file became corrupted on three nodes.
I followed your instructions and restored the hashstore on two nodes, and they’re working.
But there’s a problem on the third node.

PS /storj/spoolD/storj5/storage/hashstore/1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE/s0> docker run -it --rm -v ${PWD}:/hashstore -v ${PWD}/meta:/meta storj-write-hashtbl write-hashtbl /hashstore
Counting /hashstore/0a/log-000000000000000a-00004f9a...
Counting /hashstore/0b/log-000000000000000b-00000000...
Counting /hashstore/0c/log-000000000000000c-00000000...
Counting /hashstore/0d/log-000000000000000d-00000000...
Counting /hashstore/0e/log-000000000000000e-00000000...
Counting /hashstore/0f/log-000000000000000f-00000000...
Counting /hashstore/10/log-0000000000000010-00000000...
Counting /hashstore/11/log-0000000000000011-00000000...
Counting /hashstore/12/log-0000000000000012-00000000...
Counting /hashstore/13/log-0000000000000013-00000000...
Counting /hashstore/14/log-0000000000000014-00000000...
Counting /hashstore/15/log-0000000000000015-00004fa0...
Counting /hashstore/16/log-0000000000000016-00000000...
Counting /hashstore/17/log-0000000000000017-00000000...
Counting /hashstore/18/log-0000000000000018-00000000...
Counting /hashstore/19/log-0000000000000019-00000000...
Counting /hashstore/1a/log-000000000000001a-00000000...
Counting /hashstore/1b/log-000000000000001b-00000000...
Counting /hashstore/1c/log-000000000000001c-00000000...
Counting /hashstore/1d/log-000000000000001d-00000000...
Counting /hashstore/1e/log-000000000000001e-00004fa4...
Counting /hashstore/1f/log-000000000000001f-00000000...
Counting /hashstore/20/log-0000000000000020-00004fa5...
Counting /hashstore/21/log-0000000000000021-00004fa6...
Counting /hashstore/22/log-0000000000000022-00000000...
Counting /hashstore/23/log-0000000000000023-00000000...
Counting /hashstore/24/log-0000000000000024-00000000...
Counting /hashstore/25/log-0000000000000025-00004fa8...
Counting /hashstore/26/log-0000000000000026-00000000...
Counting /hashstore/27/log-0000000000000027-00000000...
Counting /hashstore/28/log-0000000000000028-00004faa...
Counting /hashstore/29/log-0000000000000029-00000000...
Counting /hashstore/2b/log-000000000000002b-00000000...
platform: invalid argument
        storj.io/storj/storagenode/hashstore/platform.mmap:30
        storj.io/storj/storagenode/hashstore/platform.Mmap:16
        main.openFile:37
        main.(*cmdRoot).iterateRecords:148
        main.(*cmdRoot).countRecords:212
        main.(*cmdRoot).Execute:79
        github.com/zeebo/clingy.(*Environment).dispatchDesc:129
        github.com/zeebo/clingy.Environment.Run:41
        main.main:29
        runtime.main:285

Stops and does not go to the Processing phase

This file turned out to be zero length. I deleted it.

Did it help to going further?

I’m having an issue too. My Sata controller got fried, so I replaced it, now due to inconsistency on my filesystem, some nodes cannot start due to hashstore. ZFS is doing a resilvering already.

-21T17:01:31Z	ERROR	failure during run	{"Process": "storagenode", "error": "Failed to create storage node peer: hashstore: read config/storage/hashstore/121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6/s1/meta/hashtbl-000000000000001c: input/output error\n\tstorj.io/storj/storagenode/hashstore.(*roPageCache).ReadRecord:660\n\tstorj.io/storj/storagenode/hashstore.(*HashTbl).ComputeEstimates:292\n\tstorj.io/storj/storagenode/hashstore.OpenHashTbl:169\n\tstorj.io/storj/storagenode/hashstore.OpenTable:117\n\tstorj.io/storj/storagenode/hashstore.NewStore:258\n\tstorj.io/storj/storagenode/hashstore.New:98\n\tstorj.io/storj/storagenode/piecestore.(*HashStoreBackend).getDB:251\n\tstorj.io/storj/storagenode/piecestore.NewHashStoreBackend:117\n\tstorj.io/storj/storagenode.New:604\n\tmain.cmdRun:84\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.2:388\n\tstorj.io/common/process.cleanup.func1:406\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:115\n\tmain.main:34\n\truntime.main:283", "errorVerbose": "Failed to create storage node peer: hashstore: read config/storage/hashstore/121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6/s1/meta/hashtbl-000000000000001c: input/output error\n\tstorj.io/storj/storagenode/hashstore.(*roPageCache).ReadRecord:660\n\tstorj.io/storj/storagenode/hashstore.(*HashTbl).ComputeEstimates:292\n\tstorj.io/storj/storagenode/hashstore.OpenHashTbl:169\n\tstorj.io/storj/storagenode/hashstore.OpenTable:117\n\tstorj.io/storj/storagenode/hashstore.NewStore:258\n\tstorj.io/storj/storagenode/hashstore.New:98\n\tstorj.io/storj/storagenode/piecestore.(*HashStoreBackend).getDB:251\n\tstorj.io/storj/storagenode/piecestore.NewHashStoreBackend:117\n\tstorj.io/storj/storagenode.New:604\n\tmain.cmdRun:84\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.2:388\n\tstorj.io/common/process.cleanup.func1:406\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:115\n\tmain.main:34\n\truntime.main:283\n\tmain.cmdRun:86\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.2:388\n\tstorj.io/common/process.cleanup.func1:406\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:115\n\tmain.main:34\n\truntime.main:283"}

Error: Failed to create storage node peer: hashstore: read config/storage/hashstore/121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6/s1/meta/hashtbl-000000000000001c: input/output error

	storj.io/storj/storagenode/hashstore.(*roPageCache).ReadRecord:660

	storj.io/storj/storagenode/hashstore.(*HashTbl).ComputeEstimates:292

	storj.io/storj/storagenode/hashstore.OpenHashTbl:169

	storj.io/storj/storagenode/hashstore.OpenTable:117

	storj.io/storj/storagenode/hashstore.NewStore:258

	storj.io/storj/storagenode/hashstore.New:98

	storj.io/storj/storagenode/piecestore.(*HashStoreBackend).getDB:251

	storj.io/storj/storagenode/piecestore.NewHashStoreBackend:117

	storj.io/storj/storagenode.New:604

	main.cmdRun:84

	main.newRunCmd.func1:33

	storj.io/common/process.cleanup.func1.2:388

	storj.io/common/process.cleanup.func1:406

	github.com/spf13/cobra.(*Command).execute:985

	github.com/spf13/cobra.(*Command).ExecuteC:1117

	github.com/spf13/cobra.(*Command).Execute:1041

	storj.io/common/process.ExecWithCustomOptions:115

	main.main:34

	runtime.main:283

2025-10-21 17:01:31,839 WARN exited: storagenode (exit status 1; not expected)


Some nodes still work, but the other ones wont boot. What can I do?
All files are there, I checked it with Filezilla.

I hope I wont loose my Nodes due to this :frowning:

I guess you need to rebuild that corrupted hashtable.

1 Like