In my observation, even operating system ( Windows ) won’t show any new free space after TTL data is being deleted. This makes me wonder if TTL data is really being deleted or not as we only see “count” line in the log not the original deleted expired piece
.
it not deleting all at once, step by step as was uploaded.
I saw multiple lines of count: 600k
which should update at least 10s of gbs. I had no ingress so the number didnt update at all. I need to further investigate.
2024-06-29T17:04:56Z INFO collector collect {"Process": "storagenode", "count": 664118}
2024-06-29T22:52:07Z INFO collector collect {"Process": "storagenode", "count": 666053}
2024-06-30T05:29:22Z INFO collector collect {"Process": "storagenode", "count": 667016}
2024-06-30T09:20:18Z INFO collector collect {"Process": "storagenode", "count": 667057}
2024-06-30T11:11:36Z INFO collector collect {"Process": "storagenode", "count": 638589}
Accordingly this issue:
I think it’s not updating databases with a new free space.
So the workaround would be the same - restart the node with a used-space-filewalker enabled, until the new version would be rolled out.
It’s interesting. It should be deleted. Just like @Vadim said - with the same time intervals (if the TTL was the same though).
What’s the point? The TTL data is removed daily. The startup piecescan is run only at restart and it takes days. Should we keep restarting the nodes when the FW finished? It’s pointless. Just fix the bug and release those updates quicker. THIS SHOULD BE THE PRIORITY NUMBER ONE!
this could explain why 12 days after reducing the allocated space (from several TBs to 1TB because i want to migrate the SN to another disk faster), the used space is still the same.
I do not think that you need to restart it daily. However, If your node is rendered as full, but you have a free space, you may restart it to make updates.
And I do not suggest it as a fix, it’s a workaround until the fix would be rolled out.
I think you have to do some redesigns alexey. To much issues for everyone.
I know, and there is a work in the team, but lightly exposed right now
like
Storj itself is relatively lightweight: even with current ingress. But seemingly running used-space-filewalker 24x7… 100% iops all-the-time… does get tiring. Good thing I’m a patient man
Devs, do you recommend to update to 1.107.3? Or should we wait something?
Wait! Because if you recreate the container, it could go down a version and the databases are incompatible. You will end up updating it again or deleting the databases.
The version server url is not hard coded. This can be used to avoid such downgrades or to let it install any version you want.
We have started the normal rollout procedure. Please be patient and follow the rollout process. If we would all upgrade our nodes by hand we are risking to lose customer data. The rollout procedure is designed to mitigate that risk. If you feel the rush to install it faster please don’t do it on all of your nodes all at once. Better pick a single node first, let it run for a few hours - a day, upgrade like 50% of your nodes , wait again and last but not least the remaining nodes.
Are you running some crazy benchmarks right now? Last 2 days all my nodes restart every few hours
Find out why do they restart and fix it. It’s in no way normal.
I think having 300gb egress of daily repair isn’t normal…
I would say not more crazy than on other days. It should still be the same load.
What has changed is the TTL cleanup that now comes on top. Maybe that is too much for your storage node?
probably… I will monitor it for few days and if its gonna keep going Im gonna just change config to full drives to not get more ingress. This should fix it temporarly