[Tech Preview] Hashstore backend for storage nodes

I’m still not seing any of my nodes with these symptoms, but they are all running from docker compose - if you are running them by “hand” it could pose a difference, as compose removes the container with the “docker compose stop” command, and recreates it with a “docker compose up -d” ?

Could be a difference to take into the considerations trying to figure this out..

If you don’t use docker compose down, but docker compose stop, then the container will not be removed, it will be stopped as with docker stop.

There could be a just timing, i.e. if you modifying before restart of the container, it may reset the filesystem state, because mount bind doesn’t expect modifications outside of the container.
So, probably only do docker stop, then modify, then do docker start would be enough to flush the filesystem before the modification. However, I prefer not only to stop the container, but also remove it to make sure that everything is flushed and reset to do not have any unexpected behavior.

1 Like

The question is, why does the code keep re-writing / modifying the file at all every hour.
It is obvious from the timestamp and why is it required?

I believe it’s not the code, but docker.
You confirmed that:

By the way, maybe resetting a database happened for the same reason. So, you need not only stop the container but also remove it before touching a storage location outside of the container.

But why is it not consistent then?
Only the .migrate_chore files the timestamp to be rewritten every hour. The .migrate files that are in the same directory and had been changed also did not.

I do not know, why the caching system of the docker layer filesystem choses these files. What I know, that bind mount in docker containers sometimes produces a such behavior.
So, it’s recommended to stop and remove the container before modification of files in the storage location.

I have seen some questions around moving the hashtable to a SSD.

I want to start with a fair warning.

  1. No safeguards. Node might start with an empty hashtable. Disqualification would be the end result. Even if you detect this issue in time there is no way to merge hashtables. Please add extra steps to minimize this risk.
  2. Performance gain overrated. Use memtable when possible. Memtable has better performance.

Ok now that this is done here is my way to do it:

First lets do some additional safety steps. Lets stop all incoming uploads.
sed -i ‘s/storage.allocated-disk-space: .*/storage.allocated-disk-space: 0 B/g’ /mnt/sn1/storagenode1/storagenode/config.yaml
sudo systemctl restart storagenode1
This should set the allocated space to 0 byte. Wait until the node has contacted all satellites in order to tell them that this node is full and doesn’t want any further uploads. Wait until upload activity dies down.
If you see an error message about allocated space and dont know how to avoid it you might want to stop running these steps. If you do want to continue I will give you only one hint: storage2.monitor.minimum-disk-space: 0 B
Still unable to avoid the allocated space safety check? Please don’t continue. Seriously this is a high risk operation and you should better practice with something else first.

You might also want to disable any piecestore to hashstore migration if you haven’t finished the migration already. This step is not included in my script.

Danger zone. Do not continue unless you understand the risks. Start with a small test node first.

sudo systemctl stop storagenode1

mkdir /home/storagenode/ssd/sn1/hashstore
mkdir /home/storagenode/ssd/sn1/hashstore/121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6
mkdir /home/storagenode/ssd/sn1/hashstore/121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6/s0
mkdir /home/storagenode/ssd/sn1/hashstore/121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6/s1
mkdir /home/storagenode/ssd/sn1/hashstore/12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S
mkdir /home/storagenode/ssd/sn1/hashstore/12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S/s0
mkdir /home/storagenode/ssd/sn1/hashstore/12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S/s1
mkdir /home/storagenode/ssd/sn1/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs
mkdir /home/storagenode/ssd/sn1/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s0
mkdir /home/storagenode/ssd/sn1/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s1
mkdir /home/storagenode/ssd/sn1/hashstore/1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE
mkdir /home/storagenode/ssd/sn1/hashstore/1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE/s0
mkdir /home/storagenode/ssd/sn1/hashstore/1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE/s1

cp -r /mnt/sn1/storagenode1/storagenode/storage/hashstore/121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6/s0/meta /home/storagenode/ssd/sn1/hashstore/121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6/s0
cp -r /mnt/sn1/storagenode1/storagenode/storage/hashstore/121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6/s1/meta /home/storagenode/ssd/sn1/hashstore/121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6/s1
cp -r /mnt/sn1/storagenode1/storagenode/storage/hashstore/12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S/s0/meta /home/storagenode/ssd/sn1/hashstore/12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S/s0
cp -r /mnt/sn1/storagenode1/storagenode/storage/hashstore/12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S/s1/meta /home/storagenode/ssd/sn1/hashstore/12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S/s1
cp -r /mnt/sn1/storagenode1/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s0/meta /home/storagenode/ssd/sn1/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s0
cp -r /mnt/sn1/storagenode1/storagenode/storage/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s1/meta /home/storagenode/ssd/sn1/hashstore/12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs/s1
cp -r /mnt/sn1/storagenode1/storagenode/storage/hashstore/1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE/s0/meta /home/storagenode/ssd/sn1/hashstore/1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE/s0
cp -r /mnt/sn1/storagenode1/storagenode/storage/hashstore/1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE/s1/meta /home/storagenode/ssd/sn1/hashstore/1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE/s1

echo ‘hashstore.table-path: /home/storagenode/ssd/sn1/hashstore’ >> /mnt/sn1/storagenode1/storagenode/config.yaml
sudo systemctl start storagenode1

Run your safety checks. I believe I double checked all 8 directories to make sure all the hashstable metadata has been coppied successfully. Last but not least I deleted the s0/meta and s1/meta folders from the HDD and restarted the node one more time to see if downloads are still working. In case of any mistakes I would stop the node as quickly as possible to avoid disqualifications and take my time to find out what my mistake was. If there is no sign of any mistake and the node looks healthy I would set back the used space value, restart the node and check for upload errors.

I was running one test node like this for a few days just to find out that there is no performance difference to my other memtable nodes. At the end I moved this node back to HDD. It does allow me to swap out HDDs and get them back online on a different machine. With the hashtable on SSD this would get a bit more complicated. Why risk that if memtable on HDD is giving me already best performance?

1 Like

I just 2 days ago had power outage and SSD with all DBs gone unreadable, so I lost all 18 nodes databases, if it would be hashtables then i would lost all nodes.

There is a way to rebuild it. I am just to lazy to try that because memtable gets the job done for me.

1 Like

In my case with 18 nodes, i need around 128 GB of ram to run memtbl normaly. I hope soon i will get there.

During migration i have lot of wrong node size problem it show overused space, but on HDD it 500 GB free, after migration end it is easy to fix it or even it fix itself, but it is 2 weeks without ingress

What filesystem are you using for your nodes?

What are the best parameters for nodes on zfs? Memtbl or hashtbl?

I wonder if my node has enough RAM and if I activate ‘memtbl’.

There are other tasks and services running on my server, so if at any given moment they use more RAM (the other services), I suppose what is stored with memtbl is deleted?

Does this mean that everything has to be rebuilt or put back into ‘memtbl’?

How does this work? Is it done gradually? Are there parts that remain in the RAM, those that are read more often?

Wouldn’t it be more useful to write what is in the RAM to the hard drive and thus use a hybrid system that writes to the RAM and, if there is not enough RAM available, switches to using hashtbl? if this is not the case.

My node is currently using 16 GB of RAM for 65 TB used space.

In my case the linux kernel would start killing processes in order to free up some RAM and keep the system running.

On every starup memtable will rebuild. It looks like this in the logs:

2025-08-26T11:12:43+02:00       INFO    hashstore       hashstore opened successfully   {"Process": "storagenode", "satellite": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "open_time": "621.832683ms"}
2025-08-26T11:12:51+02:00       INFO    hashstore       hashstore opened successfully   {"Process": "storagenode", "satellite": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "open_time": "8.389304306s"}
2025-08-26T11:12:53+02:00       INFO    hashstore       hashstore opened successfully   {"Process": "storagenode", "satellite": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "open_time": "2.138755037s"}
2025-08-26T11:12:53+02:00       INFO    hashstore       hashstore opened successfully   {"Process": "storagenode", "satellite": "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE", "open_time": "2.877847ms"}
2 Likes

On every starup memtable will rebuild. It looks like this in the logs […]

Does that make memtable more resilient against powerfailures?

interesting, @jtolio told that around 1,3 GB RAM per TB used recommended. So it use only 0,25 Gb per TB in real, why it so big difference?

2 Likes

I don’t know. I haven’t managed to kill my node. Looks robust too me.

1 Like

memtbl is a confusing name, it’s not in-memory only. It’s a hybrid system, where you have an on-disk structure, and an in-memory index. The disk structure is more sequential, as the in-memory structure works like an index, and can help to find the interesting part of the persisted data.

On the other hand hashtbl is more like a fully persisted version, which means for each update a disk seek is required to the right place. (also: same bytes can be rewritten multiple times, which is less SSD friendly on paper. We monitor our SSD, and didn’t notice any problem or quick degradation so far).

4 Likes

So now we are building systems for storagenodes? :unamused_face:
SSD, RAM, filesystem optimisations… isn’t suppose to be run on what we have, on the daily used systems for usual activity?

no one told that you cant run it on what you have. But now you can optimize it if you want.

1 Like