Cleanup before Hashstore and migration to it

Alexey · August 30, 2025, 7:40am

Yes, it should be. But others reported that if you restart, your node likely will double account the used storage, so it may report itself to the satellites, that’s full, so it could stop an ingress.
The workaround is known - delete the prefixes database and restart.
It’s not fully confirmed though.

If we could confirm, we can submit a bug.

jammerdan · August 30, 2025, 9:27am

Maybe some just don’t notice the issue they have so much space and assigned space that temporary overusage does not stop ingress completely.

Alexey · August 30, 2025, 10:08am

Maybe. I do not know exactly. And you, likely - too. We need an exact path to reproduce. Then it will be fixed immediately.

RecklessD · August 30, 2025, 10:16am

For me:
Node slowly grew in size whilst conversion was running.
A restart doubled the reported size. Subsequent restart added the original size of node again. Ingres stopped when node size exceeded max node size.
Stop node. Delete used_by_prefix database. Start node. Node size returned to its physical size.

Walter1 · August 30, 2025, 11:27am

After Migration do you still recommend:

The lazy fileworker: # run garbage collection and used-space calculation filewalkers as a separate subprocess with lower IO priority
pieces.enable-lazy-filewalker: true
The old startup-scan: # if set to true, all pieces disk usage is recalculated on startup
storage2.piece-scan-on-startup: true

Both should be set to true or? Is the startup-scan also going trough all the hashstore-logs and restoring the databases from it?
And the lazy-filewalker is already working with the hashstore-logs?

Walter1 · August 30, 2025, 11:36am

On some other STORJ-Node without migration I also see so much trash unfortunately. I now deleted the used_space_per_prefix.db, for sure backed it up in another folder, and it is new recreated. Also running startup2-scan and lazy-filewalker-scan on it.
grafik

Alexey · August 31, 2025, 2:33am

Both can be in any state or commented out to have their defaults.

What do you call trash? The value on the dashboard or the folder on the disk?
Could you please compare the size of the trash folder with the value on the dashboard?

Walter1 · August 31, 2025, 7:57am

The trash on the dashboard. Am checking now the used folders on the disk with
du -h --max-depth=1
but it takes some time…will come back soon.

mike · August 31, 2025, 12:12pm

Remember that after migration, trash is not stored in the trash folder, but part of the hashstore files. So counting that path will likely just give you part of the complete used space for trash.

Walter1 · August 31, 2025, 12:42pm

So the folder trash would also not be needed anymore. Do you know if piece_expirations is still being used? It should also be obsolete, or?

mike · August 31, 2025, 1:25pm

I assume it would become obsolete once migration is completed. It’s handled by hashstore grouping similar expiration date together.

jammerdan · September 1, 2025, 8:07am

Another one who noticed that can be investigated:

Walter1 · September 1, 2025, 11:45am

This new hashstore is super great. I feel it from the beginning at the first spot.

You can even track the active migration success by typing " du -h --max-depth=1 /mnt/STORJ-1/storage/hashstore/" and then it is counting the files.

Now I switched all my nodes to passive hashstore. Some are passive and active and rewriting the files. Unfortunately it does take months…But yeah it is what it is and there is no other way.

Walter1 · September 1, 2025, 7:41pm

Nice, practice tipp for the migration. Don’t delete the blobs-folder, just the content of it. There will be an error message as of now:

But the solution is also easy, just manually recreate the blobs folder in node/storage.

mike · September 1, 2025, 8:08pm

Yes.. you did go a little overboard on the spring-cleaning

Walter1 · September 10, 2025, 8:55am

Unfortunately I’m also seeing a higher cancel-rate on the hashstore nodes.

Here two new nodes with piecestore:

Here the results:

2367
572
2
3
5125
3616
1
6

Basicly 0% cancel-rate. Now with two newer nodes, full on hashstore:

3008
742
71
23

2179
1188
64
33

As you can see it’s 2% to 3%. Most of my nodes are 2% to 3%, didn’t experience 10% to 30% yet.

All the four new nodes are on SSD’s so it’s not related to some probably poor HDD I/O performance.

Now I want to activate the memtbl.

Just put this into the config file:
hashstore.table-default-kind=memtbl
hashstore.memtbl.mmap=true
hashstore.memtbl.max-size=128MiB
hashstore.compaction.rewrite-multiple=10
hashstore.compaction.probability-power=2

Can some explain the parameters? It should take 1,3 GB of RAM per TB.

Probably the memtbl solves the cancel-rate. But it’s more to doubt as the SSD I/O performance is very high and the SSD in iorate kinda idle.

Walter1 · September 10, 2025, 11:57am

Somehow I’m getting READ FPDMA QUEUED error messages in the journal:

Should I disable the NCQ? It’s likely related to that, by asking ChatGPT.

Mitsos · September 10, 2025, 12:30pm

That’s why you don’t rely on AI for actual troubleshooting.

Your disk returned an IO error (second to last line). Your disk and/or cable is failing to return proper data. Post your complete SMART so we can troubleshoot it.

Walter1 · September 10, 2025, 12:46pm

AI ist super grat for knowledge-gaining and trouble-shooting. Hopefully it stays and advances, even though it’s maintenance cost for the maintainer should be very high.

The SMART-Values are looking fine:

Mitsos · September 10, 2025, 2:09pm

smartctl -x /dev/sdb