[Tech Preview] Hashstore backend for storage nodes

victorelec14 · March 22, 2025, 8:40pm

Hi,

After migrating my node to hashstore, it still have several blobs left, shouldn’t they all be migrated ? or are they remnants of old removed satellites?

find blobs/* -type f  | wc -l
90322

ls blobs
6r2fgwqz3manwt4aogq343bfkh2n5vvg4ohqqgggrrunaaaaaaaa  arej6usf33ki2kukzd5v6xgry2tdr56g45pp3aao6llsaaaaaaaa

this is too after execute the clean:

find blobs/* -type f -empty -delete
find blobs/* -type d -empty -delete

this happens in 4 node, and all of them has this 2 folders, is secure to delete this folders ?

thanks

molnart · March 22, 2025, 8:43pm

i am running just a single node on a ZFS pool where a bunch of other things are happening, such as torrents, backups, etc.

but it looks like the compaction finished this morning, as i see the last compaction message from 9:43 AM. Looking at the times they were really diverse, from a few ms to 40 hours (and a bunch of other taking 26h, 18h, etc.) and it took like 8 days to finish.

and i am still getting these migration messages all the time in the logs. are these normal?

2025-03-22T21:04:32+01:00       INFO    piecemigrate:chore      enqueued for migration  {"Process": "storagenode", "sat": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2025-03-22T21:04:32+01:00       INFO    piecemigrate:chore      enqueued for migration  {"Process": "storagenode", "sat": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2025-03-22T21:04:32+01:00       INFO    piecemigrate:chore      all enqueued for migration; will sleep before next pooling      {"Process": "storagenode", "active": {"12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S": true, "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs": true, "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE": true, "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6": true}, "interval": "10m0s"}
2025-03-22T21:14:30+01:00       INFO    piecemigrate:chore      couldn't migrate        {"Process": "storagenode", "error": "opening the old reader: pieces error: invalid piece file for storage format version 1: too small for header (0 < 512)", "errorVerbose": "opening the old reader: pieces error: invalid piece file for storage format version 1: too small for header (0 < 512)\n\tstorj.io/storj/storagenode/piecemigrate.(*Chore).migrateOne:318\n\tstorj.io/storj/storagenode/piecemigrate.(*Chore).processQueue:260\n\tstorj.io/storj/storagenode/piecemigrate.(*Chore).Run.func2:167\n\tstorj.io/common/errs2.(*Group).Go.func1:23", "sat": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "id": "WEOQ6T5BYKKB4AZDK23NQNTUURDOP7GHY64NLDVO37C2A4PL7RMQ"}
2025-03-22T21:14:31+01:00       INFO    piecemigrate:chore      couldn't migrate        {"Process": "storagenode", "error": "opening the old reader: pieces error: invalid piece file for storage format version 1: too small for header (0 < 512)", "errorVerbose": "opening the old reader: pieces error: invalid piece file for storage format version 1: too small for header (0 < 512)\n\tstorj.io/storj/storagenode/piecemigrate.(*Chore).migrateOne:318\n\tstorj.io/storj/storagenode/piecemigrate.(*Chore).processQueue:260\n\tstorj.io/storj/storagenode/piecemigrate.(*Chore).Run.func2:167\n\tstorj.io/common/errs2.(*Group).Go.func1:23", "sat": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "id": "7FY52IOAKIO4WGT56OPC33TQ5JFLKJSU3IGZYUQPUM2EOONJKPVQ"}
2025-03-22T21:14:31+01:00       INFO    piecemigrate:chore      enqueued for migration  {"Process": "storagenode", "sat": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S"}
2025-03-22T21:14:32+01:00       INFO    piecemigrate:chore      enqueued for migration  {"Process": "storagenode", "sat": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs"}
2025-03-22T21:14:32+01:00       INFO    piecemigrate:chore      enqueued for migration  {"Process": "storagenode", "sat": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE"}
2025-03-22T21:14:32+01:00       INFO    piecemigrate:chore      enqueued for migration  {"Process": "storagenode", "sat": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6"}
2025-03-22T21:14:32+01:00       INFO    piecemigrate:chore      all enqueued for migration; will sleep before next pooling      {"Process": "storagenode", "active": {"12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S": true, "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs": true, "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE": true, "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6": true}, "interval": "10m0s"}

Mark · March 22, 2025, 9:00pm

I believe so.
Check those blobs folders in this post

victorelec14 · March 22, 2025, 9:48pm

@Mark Thanks for the help, I didn’t know that post about satellites.

I just looked at them and if they are old satellites, I guess there will be no problem to delete them.

littleskunk · March 22, 2025, 10:44pm

TTL data is cheap to compact. The storage node puts all pieces with the same TTL into the same LOG file. Once the time is reached it deletes that LOG file without having to rewrite anything.

The node can’t predict which pieces might get deleted by garbage collection. That will be a bit more expensive for the compact job.

Alexey · March 23, 2025, 6:41am

Then it will not be append-only anymore, so the main advantage will be thrown away.

snorkel · March 25, 2025, 1:33pm

I have another ideea.
Put all the short TTL pieces in the same logs, and don’t do any compaction; just wait for all the pieces to expire and delete the entire log file.
The sticky data pieces will have their own log files, that will require less frequent compaction.
Another thing would be to make smaller log files, that can be compacted quickly.

arrogantrabbit · March 25, 2025, 2:50pm

Excellent idea. Let’s take it to the next level and make log files the size of the pieces themselves. One piece per log file. It’s called “files on disks”

littleskunk · March 25, 2025, 2:58pm

You are very late with that “idea” as you can read up here: storagenode/hashstore: clump pieces by ttl · storj/storj@1617c0a · GitHub

molnart · March 29, 2025, 1:21pm

i had really high hopes in the hashstore, and the initial experience was amazing with massively decreased disk access after the hashstore (for comparison, ZFS scrub on piecestore was taking over 10 days, but dropped to ~25 hours after the migration)
however since 1.124 hashstore is hogging up my drives almost the same way as picestore was

i am constantly getting high I/O by the storagenode process, resulting in my disk pool being slow for other processes.

i don’t get why this was not happening on pre-1.124 versions and how i can disable this behaviour. I am starting to count down the months until I get back my held back amount and i can finally exit this.

JuhaML · March 29, 2025, 3:56pm

Previous versions were too lazy with compacting, new version playing catch-up, apparently compacting pretty much everything. On my meager 2TB node, it took maybe a day or so, then it calmed down once again.

Just today, about 100GB of trash got deleted and that triggered about 1.5h compaction burst. No small time, that, but nothing like the “catch-up” disk thrashing.

FWIW, I moved hashtables to an SSD yesterday (using symlinks), that may have sped up the most recent compaction some. Quite a lot of hashtable updating going on during compaction, it seems. Anyway, trash deletion isn’t a daily thing - more like a weekly one - judging by my own experience.

Be warned: my node is experimental, I’ve migrated it several times, messed around with it quite a bit, yet it has miraculously survived for almost 2 years. Probably not wise to repeat my moves for “production” nodes… if mine dies, it dies, zero tears shed.

One more year, give or take a few months, and I will have to exit (gracefully, I hope) anyway. Then start fresh after major (about three months) renovations to my habitat. Presumably with much speedier internet connection, upload in particular.

Having said that, I am willing to try memtables once they are released and someone more knowledgeable instructs me how to do that.

UPDATE: While waiting for the official release with memtbl, I scraped together my own “solution”: My "memtables" kludge

javierxam · March 30, 2025, 5:50am

Can we calm the I/O of HDD if we move hashtable DB to a SSD?

Alexey · March 30, 2025, 6:37am

Unlikely. The compaction is going on with logs, they are analogue of blobs in the piecestore backend. Hash tables are relatively small and are usually cached in memory by the file system.

Mr.Clouds · April 2, 2025, 7:29pm

So, Is it safe to use this feature now? Is it tuned for better than it was last year?

ACarneiro · April 2, 2025, 8:16pm

It’s not in production yet.
Seems “safe” insofar as nobody is reporting nodes dying after deployment but I, for one, am waiting for it to mature a lot more before taking the plunge.

Roxor · April 2, 2025, 8:44pm

Yeah, once new nodes install with hashstore by default… then I’ll look into migration. But nodes have been mostly idle for six months now, so piecestore is working fine and there’s no rush.

snorkel · April 5, 2025, 8:15am

It’s still in beta phase I suppose, but I don’t see a future for hashstore if continues the same behavior like now. Those compactions hammer the drive non-stop. For the piecestore, without Badger cache, there was only intense reading, at startup, and with trash movement, but with hashstore there is also intense writing. The performance benefits are going out the window.
They should implement an option to go back to piecestore. It was a bad move to switch to hashstore.
I’ll keep it 3 more months maybe, because it’s a new node, with 1.5TB, and if nothing improves, I will switch all flags to false and wait for it to move slowly back to piecestore and badger.

alpharabbit · April 5, 2025, 12:05pm

I will keep my 2 test nodes with hashstore (one windows, one linux) running but not converting my other nodes until I really can see a benefit. Both hashstore and filestore working without problems on my hardware atm.

I guess storj is developing hashstore primarily for select and the kind of storage they have to deal with there. This is probably not our good old one node per hdd thing.

Alexey · April 6, 2025, 7:10am

You may disable the migration to hashstore and the node would serve both backends.

Alexey · April 6, 2025, 7:12am

It’s still “one node per HDD” way. The only difference that there are much more intensive traffic (hundreds Gbps), so the hashstore is must have.