[Tech Preview] Hashstore backend for storage nodes

littleskunk · August 26, 2025, 8:03pm

Wait what? You are given 3 different backends to pick from (piecestore, hashstore, memtable). And you complain that the additional backends force you to change your hardware? How that?

xsys · August 26, 2025, 8:19pm

How do I enable memtbl on Synology?

Not sure what it is in this context, on Synology.

Vadim · August 26, 2025, 8:26pm

How much RAM do you have there? and how much used space? you need around 1.3Gb RAM per TB used space

xsys · August 26, 2025, 8:29pm

32 GB RAM. The Node has 3.8 TB and is fully converted to hashstore.

xsys · August 27, 2025, 9:38pm

I wish to be able to set this function in config.yaml

jammerdan · August 28, 2025, 2:08am

They seem to be working on it:

EasyRhino · August 31, 2025, 3:52am

alright guys, I can’t find it, how do I configure memtable vs hashtable? (i’m using docker compose)

PieceKeeper · August 31, 2025, 5:59am

To use memtable

    environment:
      # Hashstore MemTable
      - STORJ_HASHSTORE_TABLE_DEFAULT_KIND=memtbl

Comment this line out to use hashtable

mike · August 31, 2025, 12:15pm

This is my complete hashstore settings incl memtbl from “environment:” section of compose file for at node:

      - STORJ_HASHSTORE_COMPACTION_PROBABILITY_POWER=2
      - STORJ_HASHSTORE_COMPACTION_REWRITE_MULTIPLE=10
      - STORJ_HASHSTORE_MEMTABLE_MAX_SIZE=128MiB
      - STORJ_HASHSTORE_SYNC_LIFO=true
      - STORJ_HASHSTORE_STORE_FLUSH_SEMAPHORE=1
      - STORJ_HASHSTORE_TABLE_DEFAULT_KIND=memtbl
      - STORJ_HASHSTORE_MEMTBL_MMAP=true
      - STORJ_HASHSTORE_MEMTBL_MLOCK=true

Hope it can help shed som light on how you should configure it Also, remember the settings does not go in effect until next full compaction cycle.

EasyRhino · August 31, 2025, 6:07pm

thank you do we know what

STORJ_HASHSTORE_MEMTBL_MLOCK

means? or if we care?

mike · September 1, 2025, 2:24am

I believe it causes memtbl to stay in physical RAM and not swap to pagefile

elek · September 2, 2025, 4:03pm

mike:

      - STORJ_HASHSTORE_COMPACTION_PROBABILITY_POWER=2
      - STORJ_HASHSTORE_COMPACTION_REWRITE_MULTIPLE=10
      - STORJ_HASHSTORE_MEMTABLE_MAX_SIZE=128MiB
      - STORJ_HASHSTORE_SYNC_LIFO=true
      - STORJ_HASHSTORE_STORE_FLUSH_SEMAPHORE=1
      - STORJ_HASHSTORE_TABLE_DEFAULT_KIND=memtbl
      - STORJ_HASHSTORE_MEMTBL_MMAP=true
      - STORJ_HASHSTORE_MEMTBL_MLOCK=true

I wouldn’t recommend using LIFO / SEMAPHORE. We turned them off, because they can cause problems on your node during high load:

I stole some explanation from @jtolio (with some edits):

SEM_FLUSH=1 has a mutex, where only one disk write is allowed through to the kernel at a time. the problem with SEM_FLUSH=1 in a loaded node is that the go runtime has to make sure at least one disk write is active. we did a couple of goroutine dumps where the go scheduler simply had not yet had the opportunity to wake up a waiting disk write, and no disk activity was happening at all. we thought that we could do a better job scheduling writes than the kernel, but this is probably not true.

LIFO changes the order of how we process incoming pieces. the expected thing is FIFO - we prioritize the oldest outstanding request. this is a “fair” feeling scheduling, where we handle requests in the order they come in. LIFO makes the following observation: due to long tail canceling, it would be better for a majority of the piece uploads to just happen immediately, and the ones that happen to overloaded nodes are going to just be discarded anyway. so instead of a slow node trying to churn through its queue in order, just try and do the most recently incoming request first, expecting that very old requests are going to be rejected anyway. this actually does work great when the network is unsaturated.
but when network as a whole got very saturated… it can cause problems

Toyoo · September 2, 2025, 11:28pm

I would argue that LIFO handling is the rational choice for node operators, especially those not in low-latency locations or with close to saturated bandwidth, exactly because there’s a bigger chance at winning a race for freshly uploaded pieces, especially given the success ratio now impacts how many other chances at receiving uploads will a node receive.

I do recall that the Storj whitepaper explicitly assumes no altruistic nodes.

Ottetal · September 3, 2025, 7:07am

It might be a language barrier that’s preventing my understanding - but what is an altruistic node in this context?

jammerdan · September 3, 2025, 7:44am

I’d guess a node that wants to win races for itself and not for others.

Toyoo · September 3, 2025, 10:18am

Per Storj Whitepaper:

We adopt the Byzantine, Altruistic, Rational (BAR) model [11] to discuss participants in the network.

Byzantine nodes may deviate arbitrarily from the suggested protocol for any reason. Some examples include nodes that are broken or nodes that are actively trying to sabotage the protocol. In general, a Byzantine node is a bad actor, or one that optimizes for a utility function that is independent of the one given for the suggested protocol.

Inevitable hardware failures aside, Altruistic nodes are good actors and participate in a proposed protocol even if the rational choice is to deviate.

Rational nodes are neutral actors and participate or deviate only when it is in their net best interest.

Some distributed storage systems (e.g. datacenter-based cloud object storage systems) operate in an environment where all nodes are considered altruistic. For example, absent hardware failure or security breaches, Amazon’s storage nodes will not do anything besides what they were explicitly programmed to do, because Amazon owns and runs all of them.

In contrast, Storj operates in an environment where every node is managed by its own independent operator. In this environment, we can expect that a majority of storage nodes are rational and a minority are Byzantine. Storj assumes no altruistic nodes.

We must include incentives that encourage the network to ensure that the rational
nodes on the network (the majority of operators) behave as similarly as possible to the expected behavior of altruistic nodes. Likewise, the effects of Byzantine behavior must be minimized or eliminated.

elek · September 3, 2025, 10:30am

I don’t think so.

It might be rational to discard (!) old, pending requests, and focus on serving new requests.

But current LIFO implementation can crash your node (it doesn’t discard old requests, just keeping them in the memory, which can make you node unavailable / OOM killed…).

So in case of high load (when node can really get lot’s of new data), with LIFO you can be out of the game.

Toyoo · September 3, 2025, 10:34am

Ah, that wasn’t stated in the earlier explanation. Still, just a quality of implementation issue which, per this conversation, while of rather low priority to fix for Storj, could be fixed.

EasyRhino · September 3, 2025, 4:14pm

I believe I had this happen to me when I’ve had backend storage problems

mike · September 3, 2025, 9:34pm

I have not experienced this issue, but then again I have plenty of RAM and low latency non saturated connection.

I think I might have introduced the discussion on these additional “non standard” settings, by sharing what I was using in my configuration.

So.. I just wanted to put some context as to why I have been using them.

Originally I was looking for ways to optimise and use memtbl after completing my migrations to hashtable. These were mentioned by @littleskunk [Tech Preview] Hashstore backend for storage nodes - #467 by littleskunk previously in this thread, and when reading through the function and description of each of them, it made sense in my mind to enable it.

I have however, based on @elek newly shared details, removed them again and to be honest, I do not see any change in successrate, memory usage og disk load.

Perhaps my ZFS, offloading of databases, orders, logs etc. + low CPU load and more than enough free RAM does not gain any benefit from them.