Hashstore error preventing node restart

bryanpendleton · May 14, 2025, 6:19pm

Looks like the machine behind my nodes had a power failure recently. 4/5 nodes came up fine. The last one is failing to start up, it logs successfully starting up two satellites for hashstore then logs this and aborts:

failure during run      {"Process": "storagenode", "error": "Failed to create storage node peer: hashstore: logSlots calculation mismatch: size=34603008 logSlots=19\n\tstorj.io/storj/storagenode/hashstore.OpenHashtbl:116\n\tstorj.io/storj/storagenode/hashstore.OpenTable:121\n\tstorj.io/storj/storagenode/hashstore.NewStore:258\n\tstorj.io/storj/storagenode/hashstore.New:93\n\tstorj.io/storj/storagenode/piecestore.(*HashStoreBackend).getDB:248\n\tstorj.io/storj/storagenode/piecestore.NewHashStoreBackend:114\n\tstorj.io/storj/storagenode.New:598\n\tmain.cmdRun:84\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:272", "errorVerbose": "Failed to create storage node peer: hashstore: logSlots calculation mismatch: size=34603008 logSlots=19\n\tstorj.io/storj/storagenode/hashstore.OpenHashtbl:116\n\tstorj.io/storj/storagenode/hashstore.OpenTable:121\n\tstorj.io/storj/storagenode/hashstore.NewStore:258\n\tstorj.io/storj/storagenode/hashstore.New:93\n\tstorj.io/storj/storagenode/piecestore.(*HashStoreBackend).getDB:248\n\tstorj.io/storj/storagenode/piecestore.NewHashStoreBackend:114\n\tstorj.io/storj/storagenode.New:598\n\tmain.cmdRun:84\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:272\n\tmain.cmdRun:86\n\tmain.newRunCmd.func1:33\n\tstorj.io/common/process.cleanup.func1.4:392\n\tstorj.io/common/process.cleanup.func1:410\n\tgithub.com/spf13/cobra.(*Command).execute:985\n\tgithub.com/spf13/cobra.(*Command).ExecuteC:1117\n\tgithub.com/spf13/cobra.(*Command).Execute:1041\n\tstorj.io/common/process.ExecWithCustomOptions:112\n\tmain.main:34\n\truntime.main:272"}

I use ZFS for the filesystem under all of my nodes, but ironically this is the only one that’s redundant - the others I just use it for consistency in system config, but 1 drive at a time. From ZFS’ perspective, there are no data errors.

This node appears to be running v1.126.2.

Alexey · May 15, 2025, 7:51am

Hello @bryanpendleton,
Welcome to the forum!

Seems your hashstore has been corrupted.
I would share this with the team

bryanpendleton · May 15, 2025, 12:28pm

Since this doesn’t seem to be a problem with all of the node’s data but just the one satellite, yet the error is causing the node to refuse to start up at all, is there any suggestion for how to bring the node up to serve the satellites it doesn’t have corrupted data for?

snorkel · May 15, 2025, 1:46pm

Put the one satellite on the exclusion list in config and restart. Wait for an official response though. I don’t know if it will disable that sat permanently or temporary.

For ex. to disable the Saltlake sat, put this in config, or edit the existing line:

# list of trust exclusions
storage2.trust.exclusions: "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE@saltlake.tardigrade.io:7777"

bryanpendleton · May 19, 2025, 3:36pm

Any updates here? I’m not even sure which satellite is the one that’s failing.

Alexey · May 20, 2025, 4:14am

Unfortunately not. I also do not know is there a workaround.

Kopcap · June 2, 2025, 9:39pm

Does this mean that a node using hashstore can die because of an error in a single file???

Toyoo · June 2, 2025, 9:58pm

That’s what tech preview means.

mike · June 3, 2025, 6:07am

Unfortunately yes.

However, a ZFS system with CoW (copy-on-write) should lower the risk quite significantly - and perhaps that is why only 1 one with 1 sat is experiencing this issue. Imagine how the situation for @bryanpendleton would be if this was EXT4 partitions with Hashstore

It has been raised as a concern many times, and extensively discussed in the hashstore main thread here [Tech Preview] Hashstore backend for storage nodes

Most recommendations so far goes in the direction that a UPS is a very smart addition to your setup, in case you opt in for hashstore.

It has also been discussed, that a repair / rebuild tool perhaps would see the light of day in the future (this is not yet announced by StorJ)

Alexey · June 10, 2025, 6:52am

It has been merged. You can build it with latest go:

git clone git@github.com:storj/storj.git && cd storj
go install ./cmd/write-hashtbl

Then in the ~/bin subfolder there should be a binary write-hashtbl.
You may see the help

~/bin/write-hashtbl --help

the easiest way is to just use the default flags and pass it one of the store directories, so like

~/bin/write-hashtbl /mnt/storj/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s0

and do the same with /s1

You may also use docker to build the local image with the binary to do not install all developers tools, like it is described there (you need to replace the command, of course, to build this tool, not benchmarks):

Please note, the tool would place the generated hashtables in the current directory, so you would need to move them to the proper folder, or you may start this command from the proper directory as well, i.e.

cd /mnt/storj/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s0
mv meta meta.bak
mkdir meta
cd meta
~/bin/write-hashtbl /mnt/storj/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s0

bryanpendleton · July 11, 2025, 3:47pm

For anyone following along, I did the rebuild steps a month ago and my node was able to come back online and has been working since.

mike · July 18, 2025, 12:10am

How much data was in it and how long did it take to build?

mvs2025 · July 18, 2025, 8:31am

I would like to know more about the procedure, have instructions and a ready-made solution for Windows - is there one?

Alexey · July 18, 2025, 8:50am

Nobody has submit such PR so far, I would like to invite you to be a first one

Right now you can build this tool by installing GO for Windows, then use

then run like described in the same post

cd X:/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s0
mv meta meta.bak
mkdir meta
cd meta
~/bin/write-hashtbl X:/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s0

But maybe easier to use docker approach mentioned earlier. It doesn’t require to install all developers tools to build this utility:

Create a Dockerfile

FROM golang as build
RUN git clone https://github.com/storj/storj.git && \
    cd storj && \
    go install ./cmd/write-hashtbl

FROM ubuntu
WORKDIR /meta
COPY --from=build go/bin/write-hashtbl /usr/bin/

build:

docker build . -t storj-write-hashtbl

now restore (PowerShell) s0:

cd X:/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s0
mv meta meta.bak
mkdir meta

docker run -it --rm -v ${PWD}:/hashstore -v ${PWD}/meta:/meta storj-write-hashtbl write-hashtbl /hashstore

do the same for s1 (PowerShell)

cd X:/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s1
mv meta meta.bak
mkdir meta

docker run -it --rm -v ${PWD}:/hashstore -v ${PWD}/meta:/meta storj-write-hashtbl write-hashtbl /hashstore

bryanpendleton · July 18, 2025, 1:59pm

I think the satellite that was broken had a couple of hundred GiB, but I’m not sure. The rebuild took quite a few hours, but less than a day IIRC.