[Tech Preview] Hashstore backend for storage nodes

Roxor · January 17, 2025, 5:11pm

So… hashstore makes certain types of failures more difficult to recover from. And doesn’t help public nodes because they require uptime over performance (now that we’re 3+ months into low steady-state uploads).

If it helps Select SNOs that’s still good though. Paid usage is paid usage!

Roberto · January 17, 2025, 6:34pm

A couple of nodes have updated to version 1.20.4, in the release it says:

d00177b storagenode/piecemigrate: don’t log when there’s no work done

but in the logs I have this:

2025-01-17T18:26:12Z INFO piecemigrate:chore all enqueued for migration; will sleep before next pooling {“Process”: “storagenode”, “active”: {}, “interval”: “10m0s”}

is this expected behavior?

… and this

2025-01-17T18:38:20Z INFO hashstore hashstore opened successfully {“Process”: “storagenode”, “satellite”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “open_time”: “89.062556ms”}

Vadim · January 17, 2025, 7:09pm

when it is planed to migrate all to hashstore? or there is no plans for make hashstore “must have” yet? or from some point will be automatic migration?

Roxor · January 17, 2025, 7:17pm

I thought hashstore was optional for Select, because they had far fewer nodes so needed all the performance they could get… but there was no plan to make it the default for the public network. Although now I can’t find a link to back that up…

Roberto · January 17, 2025, 7:21pm

sorry, I didn’t specify that I didn’t start the migration

littleskunk · January 17, 2025, 7:26pm

Current plan is to enable the passive migration at some point. It wouldn’t affect old nodes because the current version creates a config that will still send uploads to the old piecestore backend. In a few version we can change the default and any new node that joins the network would create the same config file with hashstore backend enabled. We would be running both backends side by side.

That is the plan for the near future. Ideally we don’t have to force a migration and the benefits of hashstore will talk for themself. Think about the problems we had a few months back with garbage collection and so on. Next time we get into a similar situation the solution might be hashstore for everyone.

Roxor · January 17, 2025, 7:41pm

It’s easy to convince people to move. There will be bugs (or features) that affect both backends. But obviously Storj isn’t looking to pay 2x for dev work. Just only change the hashstore backend… then tell the community “if you want X, you need to migrate to hashstore”. Done!

alpharabbit · January 17, 2025, 7:46pm

I wonder how much space per piece is needed for hashtables? So far my testnode showing like 1.3 GB hashtables per TB logfiles. This leads to the question how much performance could suffer when hashtables don’t fit into RAM?

littleskunk · January 17, 2025, 8:00pm

This is a bit old school. Piecestore and hashstore backend are just backends. They are both build as independent modules that can be replaced. You could even write your own backend that writes the data into an SQL database if you want to. The reason this is split into modules is to avoid the outcome you describe. If there is a bug outside the backend just fix the corresponding module and don’t worry about the rest. If there is a bug in the piecestore backend it wouldn’t affect the other backends. Splitting everything into individual moduls makes this a lot easier. Sure sometimes the code isn’t split perfectly and a bugfix or new feature requires to break up a bigger module into smaller sub modules. You can’t cut these modules perfectly on day one and they will evolve over time. But you can be sure the moment the developer has to maintain 2 backends side by side they will think this part through so there isn’t going to be a bug that requires 2x dev work.

littleskunk · January 17, 2025, 8:08pm

We have a thread here about inode caching for the old piecestore backend. If I remember correct the reason was slow deletes because it took a while to iterate over all the inodes and we reached the point at which the file system cache can’t handle it. Do you really expect hashstore to perform worse than that?

My feeling we have used piecestore for such a long time that we just got used to all the issues around it to the point that we call it normal. Now we have a hard time to imagin any alternative backend and we make the mistake to think piecestore would be great. Sure it worked so far but it is time to open our mind for alternatives.

littleskunk · January 17, 2025, 8:14pm

Yes these log lines are expected. The difference between active and passive migration looks more like this:

2024-12-17T17:39:27+01:00       INFO    piecemigrate:chore      processed a bunch of pieces     {"Process": "storagenode", "successes": 10000, "size": 2321288448}
2024-12-17T17:49:26+01:00       INFO    piecemigrate:chore      processed a bunch of pieces     {"Process": "storagenode", "successes": 20000, "size": 4652626944}
2024-12-17T18:00:04+01:00       INFO    piecemigrate:chore      processed a bunch of pieces     {"Process": "storagenode", "successes": 30000, "size": 6843575296}
2024-12-17T18:11:13+01:00       INFO    piecemigrate:chore      processed a bunch of pieces     {"Process": "storagenode", "successes": 40000, "size": 9183236608}

alpharabbit · January 17, 2025, 8:52pm

The first node is old style filestore with 10 TB the second one hashstore with 450 GB.

filestorevshashstore

So 40 GB RAM is needed for a 20 TB HDD?

Toyoo · January 17, 2025, 11:48pm

Only Storj Inc. can use it on the network efficiently though. Everyone else is bound to use what Storj Inc. provides if they want to get any money from the main network.

Well, piece store can be made to require for metadata less than 1.3 GB / 1 TB of pieces, which is what @alpharabbit observes in post above.

I did describe how to ball-park the required amount, hope you’ll find it useful.

littleskunk · January 18, 2025, 12:10am

If you change the storage node backend to something else that is totally up to you and would still get paid by the satellite. The satellite doesn’t know if you are using piecestore, hashstore or something different.

alpharabbit · January 18, 2025, 12:14am

The old filestore is still working if there isn’t enough RAM to cache all metadata but how about hashstore? What happens if there is not enough ram to load all hashtables into memory?

Migration seems to be a one way ticket atm, so SNOs should know the hardware requirements.

littleskunk · January 18, 2025, 12:19am

I don’t understand. If it doesn’t fit into RAM the filesystem will read the content from disk.

Toyoo · January 18, 2025, 12:22am

The node T&C tells me to only using Storj-released code.

And even if I disregard that (because yes, I am running a patched binary anyways), even my little monitoring patches just keep conflicting with some changes introduced by Storj every two releases. I cannot imagine how difficult would it be to update a whole backend every release. Like, even the small move of the call to store expiration date into the backend managed to somehow conflict with one of my mon.Meter calls.

I do not see practical to maintain a backend outside of Storj.

Honestly, I am close to just patching the build scripts to report whatever version is the current minimum.

alpharabbit · January 18, 2025, 12:24am

So the hashstore doesn’t read the entire hashtable into a RAM buffer? Why is my small hashstore node using that much RAM then?

littleskunk · January 18, 2025, 12:31am

The filesystem cache is responsible to keep what ever it wants in memory. We tried different tricks to outsmart the filesystem like buffering the hashtable but it turns out the filesystem does that on its own anyway. No need for us to do anything special. Just write the hashtable to disk and let the filesystem make it fast.

No idea. On my Pi5, debian, ext4 system it consumes less RAM than the old piecestore backend. Works really great.

alpharabbit · January 18, 2025, 12:44am

That’s interesting… My windows hashstore node comsumes like 4x the RAM compared to the filestore node while it holds only 5% of the data. Maybe a bug in the windows release?