Hashstore rollout commencing!

snorkel · September 4, 2025, 8:43pm

So until now, I didn’t see anyone reporting better success rates or more data stored on nodes with hashstore, only suppositions that it is better… is there any proof? All I see is reports of bad success rates and less data stored.

Mitsos · September 4, 2025, 8:53pm

I’ve fully converted half of my nodes so far. My average download/upload rate has doubled. I don’t track success rates. Disk usage has started climbing slowly back up.

jammerdan · September 5, 2025, 1:22am

The proof so far seems to be the Select network where they implemented this:

However it is probably safe to assume that Select and global network have different characteristics:

I am not yet seeing such effects. But I have nodes that are still migrating. But I think I can say that at this point currently none has a better upload success rate than before. Maybe more pieces are thrown at them now? I don’t know. But hashstore should “absorb” that load according to what was posted. I’ll have to wait some more days till migrations have completed to restart nodes and see.

snorkel · September 5, 2025, 5:36am

Can’t compare select and global. I don’t believe tha race is the same for storing a piece. Success rate seems to be somehow strange, and maybe less representative for node performance, so let’s ignore that.
All that matters is stored data at the end of the month. I will switch 2 more nodes to hashstore, that are not bottlenecked by hardware, and I will have a better comparition in my own farm. I see many sno that are switching all their nodes, instead of just a few, and don’t have a way to compare them.

alpharabbit · September 5, 2025, 12:12pm

Almost all of my nodes finished migration. Performance has not changed.

snorkel · September 5, 2025, 8:44pm

Regardin rollout schedule, maybe you should take the IPs into account, not to activate the migration on more than one node at a time for a particular IP. Many multinodes are on the same system, and starting 2 or more migrations at the same time maybe will make the system too busy…?
Or this could be operators job, to opt-out of migration on any other node, and wait for the first one to finish…

alpharabbit · September 5, 2025, 8:51pm

My ancient 6700K with 32 GB RAM did 24 migrations in parallel. No problems.

elek · September 6, 2025, 9:27am

Should be fine, as the bottleneck is the disk IO.

We recommend to use one storagenode instance per physical hard disk. It’s not recommended to start multiple instances on the same disk.

(With dedicated disk, it’s even easier: you can use a faster disk usage calculation).

We tested the migration on nodes with 36 disks (36 storagenode instances) without any problem.

But true, if somebody over-provisioned disks, it’s better to closely monitor the system state.

arrogantrabbit · September 6, 2025, 8:10pm

Is it? It seems the disk IO during migration does not exceed 60 IOPs, and cpu is not bottlenecked either. It can go much faster, but does not. I assumed it’s a conscious decision?

It is possible to have it go as fast as possible?

Toyoo · September 6, 2025, 10:19pm

github.com/storj/storj

storagenode/piecemigrate/chore.go

52ae1a8e3


      
          		if d := chore.config.Delay; d > 0 {
          			if chore.config.Jitter {
          				d += time.Duration(rand.Int63n(int64(d / 2)))
          			}
          			chore.log.Debug("delaying before next piece", zap.Duration("delay", d))
          			time.Sleep(d)
          		}

github.com/storj/storj

storagenode/piecemigrate/chore.go

52ae1a8e3


      
          	Delay             time.Duration `help:"constant delay between migration of two pieces. 0 means no delay" default:"0"`
          	Jitter            bool          `help:"whether to add jitter to the delay; has no effect if delay is 0" default:"true"`

Defaults suggest going full speed though…

agente · September 8, 2025, 9:17am

I’m going to setup few new nodes.. I’m lazy and I’m not going to modify meta files for hashstore.I’m counting on you to automate everything quickly

EasyRhino · September 8, 2025, 3:49pm

That’s the sort of spirit I can get behind!

(but truthfully, I also just set up a new node, and I went ahead and set it up as full hashstore just to avoid later migration).

jammerdan · September 10, 2025, 5:57am

gingerbread233 · September 10, 2025, 6:07am

Do I get this right? When all data is stored in 1GB files, basically I don’t need a special device on ZFS anymore (for the very small files)?

Alexey · September 10, 2025, 7:09am

Sounds about right. For hashstore SSD doesn’t add too much, but RAM is.

HGPlays · September 10, 2025, 7:26am

For hashstore - is more ram preferable?

I just happend to get 256GB for cheap on hand for my server.
Do you know if i need to do anything for nodes on windows to allow them to use more ram?
So when hashstore comes out, more ram is better?

Alexey · September 10, 2025, 7:37am

It should work with any amount of RAM, but from what I have read - yes, and not only for hashstore. It’s also useful for almost any filesystem operations, independently of the used backend.
You may migrate one of the nodes to hashstore, but measure a memory usage before and after. This could shed more light on this topic.

elek · September 10, 2025, 7:50am

More RAM is always better. If nothing else, it’s a better feeling to have it

In addition what Alexey wrote (OS usually utilize RAM for read/write cache), you can directly utilize it with Hashstore.

Either you can use memtbl (keeping an index of the table records in memory). This would require 1.3 x TB of RAM.
Or, you can enable MMAP (map the hashtbl to memory)

We usually use SSD for the hashtbl (metadata) and HDD for the logs (raw binary) on servers.

We use memtbl only on selected instances (we don’t have 1.3x memory everywhere)

agente · September 10, 2025, 8:02am

What happens if I enable memtbl without having 1.3 GB of RAM per TB? Could this cause issues if someone makes a mistake?

seanr22a · September 10, 2025, 10:39am

I have the STORJ_HASHSTORE_TABLE_DEFAULT_KIND=memtbl. I’m not sure how STORJ_HASHSTORE_MEMTBL_MMAP fits in to this. Should the MMAP variable be set to true or false for best performance with memtbl? What is the default ?