[Tech Preview] Hashstore backend for storage nodes

Toyoo · August 19, 2025, 11:25pm

If file seeks are a problem with ZFS, then it’s ZFS’s fault. On Ext, seeks are free, because you just look up the right position through the extent tree embedded in file inode/metadata, and start reading from there. Though, I do recall there’s a similar solution in ZFS, so I have a feeling seeks are likely free in ZFS as well.

mike · August 19, 2025, 11:57pm

Brilliant ideas, I was thinking about how to limit the IO’s for these operations some time back.. this certainly would meet my requirements.

Background info from my setup:

I’m doing some large and heavy rsync housekeeping jobs for some mirrors on schedules and currently I’ve implemented a check for housekeeping on StorJ nodes and try to make it all co-exist on the same multi vdev ZFS. But it’s not very “nice” or clean.

Toyoo · August 20, 2025, 12:32am

Sorry, didn’t notice earlier.

arrogantrabbit · August 20, 2025, 12:35am

Filewalker is never a problem, nor are multiple concurrent filewalkers. They don’t generate IO to disks - and are effective rate limited by CPU utilization.

The concern is single thread compaction can produce enough undesirable IO to disks to cause annoyance.

seanr22a · August 20, 2025, 2:42am

I think the discussion about compaction is overblown. I’m running on 10 year old hardware with zfs and have no problems with compaction. It works great with default settings.

I’ve been using hashstore since I think it was April/May of last year when it was released. There are 24 nodes with about 130TB of data. When compaction is run on a node the system load goes up by about 4-5 and I/O delay goes up by about 3%. It’s nothing. With newer computers even the really small ones are faster than my steam machine so I can’t see hashstore being a problem.

I only run Linux so I can’t say anything about Windows.

ACarneiro · August 20, 2025, 5:36am

Only Storj can say for sure but I very much suspect that it’ll be a very significant proportion.

It would be very interesting if Storj ever published a breakdown of OS, CPU/RAM and Filesystem of network nodes but it would probably scare prospective customers away

Alexey · August 20, 2025, 6:29am

We do not have this stat as far as I know, but you may use our GitHub to check the amount of downloads:

elek · August 20, 2025, 7:33am

We need to optimize for all file systems which are wildly used by people for every day tasks. One goal is supporting storage nodes, which are running in addition to existing workloads. And many (most?) servers are using ext4 today.

I am glad that you are happy with the current version, and I hope that you will be happy with improvements…

We were not happy, as we tested to current implementation under heavy load + storing high number of pieces (10-16 TB disks).

Most of the problems are solved with hashstore: now it’s time to make sure that public network is also ready for higher throughput / lower latency / more data.

Roxor · August 20, 2025, 10:43am

If you want to upgrade us to hashstore… then you know… maybe push out a deluge of paid test data like last summer… I’d be OK with that

EasyRhino · August 20, 2025, 3:55pm

Lol good point, if there was a flood of paid test data that only worked on hashstore then you would definitely get a lot of petabytes migrated quickly!

Mitsos · August 20, 2025, 8:10pm

How will the rollout for hashstore be handled? In waves (ie update cursor style) or do we all go to sleep one night and wake up with the entire network running on hashstore immediately?

snorkel · August 20, 2025, 8:20pm

With a chosen update, when the node is updated, it starts with migration enabled, and it will take like a month to migrate all the data on a node.
Please give us the option to opt out, put that option in a version predating the migration version and inform us when we can enable that option. I want to stay on piecestore as long as possible.
I believe the big problem I have with hashstore is the defragmentation. With small files, ext4 does a pretty good job avoiding it, but with 1GB files I believe the defragmentation will happen even on ext4. Pease correct me if I am wrong.

Mitsos · August 20, 2025, 8:44pm

ext4 tries to allocate the file in full, ie not split it in chunks. And even then, ext4 tries to keep some space extra at the end of the file, just in case you edit the file and the file grows a bit. As long as the disk has free space, it will not be fragmented.

arrogantrabbit · August 20, 2025, 9:22pm

These are stats on storagenode binary:

Linux (amd64)    | ██████████████████████████████████████████████████ 609,835
Windows (amd64)  | █████████████████████████ 299,042
Linux (arm64)    | █████████ 102,395
Linux (arm)      | ██ 28,259
FreeBSD (amd64)  | 230

Few notes:

I’d expect some of those Linux thingies are actually windows users using docker. This is horrendous, why do so many people still keep putting up with Microsoft abomination of an OS?!! That’s beyond my comprehension, so I won’t even try.
Raspberry pies and other archaic arm based synologies are less than 3%. They can be disregarded. This I expected.
Where are FreeBSD fellas? What happened?!?

Another interesting tidbit – stats on storage node-updater:

Windows (amd64)  | ██████████████████████████████████████████████████ 309,692
Linux (amd64)    | ███████████████████████████ 162,176
Linux (arm64)    | ██████████ 63,441
Linux (arm)      | █ 7,163
FreeBSD (amd64)  | 28

I’m not sure why there is discrepancy in distribution here.. But it’s noteworthy. Maybe some issue – why are windows machines pulling updater so frequently?

Toyoo · August 20, 2025, 9:28pm

Yeah, I’m fuzzy on compaction myself. Would have to look into the actual implementation again. Give me FALLOC_FL_COLLAPSE_RANGE and I’m happy. Give me FALLOC_FL_PUNCH_HOLE and run full rewrite only when we run out of reasonable address space, I can understand. Other cases, I’m sort of on the edge—probably siding with Storj though, but barely.

A year ago I was somewhat close to just dedicating some time to writing my own piece store, did some small experiments, but the revenue is not good enough yet, and I saw that external patches aren’t really being integrated. Besides, Linux-specific piece store wouldn’t fly anyway in the general repository. Now I could vibe code at least half of the damn thing, but I just don’t have time.

Roxor · August 20, 2025, 10:03pm

If the Select nodes have already been the guinea pigs, and hashstore looks to be better, and the plan is to remove piecestore… it doesn’t seem like making opt-out easy is worth it. They control the speed of the rollout: they can make it so slow a full migration takes three months or something. Plenty of time to deal with any issues.

Now that I know they have a utility that can rebuild the hashtables from the logs, I’m much more comfortable with the migration.

Alexey · August 21, 2025, 3:29am

It even has been used at least once:

And you can use this method if you do not want to install GO:

Ambifacient · August 21, 2025, 6:10am

Should be higher. Raspberry Pi OS supports 64-bit and can run on the old 3B series. Also plenty of Synology NASes run x86.

I also wonder how many of those linux-amd64 nodes are from Select, and if anyone is running a node off of an Android phone lol.

jtolio · August 21, 2025, 2:15pm

Just a note about a new forum thread - Hashstore is coming out of tech preview: Hashstore rollout commencing!

jammerdan · August 21, 2025, 5:36pm

So basically all what is required is

echo -n 'true' > 121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6.migrate_chore
echo -n 'true' > 12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S.migrate_chore
echo -n 'true' > 12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs.migrate_chore
echo -n 'true' > 1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE.migrate_chore

Then restart and the node will run on hashstore? No changes in docker command or something? Or should I remove the lines with badger cache or filewalker or whatever?