[Tech Preview] Hashstore backend for storage nodes

Toyoo · September 16, 2025, 11:26pm

A file walker puts items into queue (just satellite and piece ID), and the processQueue coroutine reads them from the queue to actually process. This parameter just controls how many pieces can be in the queue—how many satellite/piece ID pairs.

Maybe it will make file walker a bit faster, I guess?

Eh, theory is nice, yes, but the devil is in details. Like how golang doesn’t actually implement the nice theoretical model of CSP it claimed to. If you want to reason about a piece of code, you need to know all the details

arrogantrabbit · September 16, 2025, 11:49pm

I’m not sure if there are any implicit assumptions elsewhere about migration happening sequentially. It would be fairly easy to spawn a bunch of workers reading from the queue and migrating pieces, but I don’t think slight speed up worth the risk of mucking up the migration…

They had to balance it with real world. The coroutines, queues, and select absolve you from drowning in locks and conditional variables, and this accomplishment alone makes it worth it. I don’t think pure implementation of anything can exist in real world and be practical.

Toyoo · September 17, 2025, 12:02am

Oh, sure, not arguing that. What I’m saying is that you need to know in what way golang deviates from these ideals. The closer to theoretical model, the more I could just leverage my education to understand it.

mike · September 17, 2025, 4:38am

I’ve toyed with these in my compose:

      --storage2migration.buffer-size=100
      --storage2migration.delay=0ms

But I agree, it seems something outside of these parameters are holding migration speed back, I was up to 5-6 nodes simultaneous migrating before getting close to performance ceiling.

Ottetal · September 17, 2025, 7:45am

Great insights into your system. I agree, migration from Piecestore to Hashstore sem to be completely random IO seek limited. At least for me, I migrate at around the same speed it took me to simply move a Piecestore from disk to disk in the olden days, with the caveat that since I’m writing to the same disk that I’m reading from, performance tanks (even further lol), once write caches are filled.

This could be reason for uprise and rebellion, but the migration is a once time thing, so I don’t really care.

In a side node, I’ve built a new rack and bigger to house my equipment, but I’d like for the migration process to be finished, before I power down anything. Rack migration will take place after Hashstore migration

New rack is 28u, following a modified version of Toms plans, that I’ve had bookmarked for what feels like an eternity

Alexey · September 18, 2025, 7:21am

5 posts were merged into an existing topic: Post pictures of your storagenode rig(s)

alpharabbit · September 18, 2025, 6:46pm

I now finished to move my first hashstore node from a damaged hdd. For one logfile only 90% of the content was copied without error. So I guess I lost like 100 MB of piece data which should be small enough for the node to survive.

BTW:
What does it mean for compaction? Will compaction work for a partially damaged logfile?

Alexey · September 19, 2025, 3:23am

I guess we will figure out soon.
I think, it should update the index and move on.

xsys · September 19, 2025, 9:51am

Where is located Trash? As it used to be like /data/storage/trash/pmw6tvzmf2jv6giyybmmvl4o2ahqlaldsaeha4yx74n5aaaaaaaa/2025-09-18 so I could know immediately when there is some old leftovers by the folder name?

Ambifacient · September 19, 2025, 3:14pm

There is no dedicated trash folder for hashstore. The garbage collection just marks parts of the log files as trash, and when the fraction of the log file that is trashed reaches some threshold it just rewrites the log files during compaction.

Toyoo · September 19, 2025, 8:22pm

Hashtables now contain just a flag for each piece whether a given piece was marked as trash. You can look up trash statistics in the node’s prometheus output, grep /mon/stats for NumTrash, LenTrash, AvgTrash, TrashPercent.

Alexey · September 20, 2025, 1:10am

I got an answer:

snorkel · September 20, 2025, 2:43am

Can’t a mechanism be implemented to compact the log even if there are damaged portions?
I imagine each piece in the log stars with some header and ends with some stop marker or is just a new line in the log. You (the compaction agent) can easily see where a damaged piece starts and ends, beacuse it’s situated between 2 good pieces (or 2 good lines), and cut that part off.
Of course the node will report that part as missing piece and be penalised or if it is lucky, the piece is deleted by TTL or client, and never gets requested, but the node will continue working without errors or crashes.

Alexey · September 20, 2025, 2:46am

As far as I understand, the node shouldn’t crash, the compaction process will throw exceptions and continue with the next log file.
But I guess that this file would live without a compaction until all pieces from it wouldn’t be deleted.

HGPlays · September 20, 2025, 7:21am

Hello - my nodes are slowly upgrading to 1.37
I have not done migration and will not bother, as i have quite a few nodes.

I would like to ask if anyone knows when the network will switch to hashstore? if there will be an announcement that “now we have switched” or if i can monitor it myself on the nodes?

Thanks in advance.
HG

Alexey · September 20, 2025, 10:16am

I would guess, but seems - at the end of this year.

EasyRhino · September 22, 2025, 4:09pm

I may have missed this in this thread, but something is consuming 100% of my disk iops.

I’m pretty sure it’s storj. I had migrated one of the smaller satellites to hashstore (like AP1 or EU1), but I don’t think it was actively migrating anything else.

I suspect it could be a compaction task, but my limited log searching made it look like compaction was finished.

What can I look for?

agente · September 22, 2025, 9:35pm

INFO hashstore finished compaction {“Process”: “storagenode”, “satellite”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “store”: “s0”, “duration”: “54h47m15.852349863s” . Failed after 54h. I killed node. (zfs)

Many: INFO piecemigrate:chore couldn’t migrate {“Process”: “storagenode”, “error”: “opening the old reader: pieces error: invalid piece file for storage format version 1: too small for header (0 < 512)”, “errorVerbose”: “opening the old reader: pieces error: invalid piece file for storage format version 1: too small for header (0 < 512)\n\tstorj.io/storj/storagenode/piecemigrate.(*Chore).migrateOne:335\n\tstorj.io/storj/storagenode/piecemigrate.(*Chore).processQueue:277\n\tstorj.io/storj/storagenode/piecemigrate.(*Chore).Run.func2:184\n\tstorj.io/common/errs2.(*Group).Go.func1:23”, “sat”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “id”:

arrogantrabbit · September 22, 2025, 10:06pm

find /mnt/storagenode/blobs -type f -size 0 -delete

Available tools depend your host OS and filesystem. On freeBSD you would use top -m io, gstat, zpool iostat -v, zpool iostat -r, etc. On linux? no idea.

Alexey · September 24, 2025, 3:20am

4 posts were split to a new topic: Post formatting