Is there a safe mechanism regardless of what local dashboard says?

Ruskiem · June 13, 2024, 9:35pm

If piece-scan-on-startup is at false, a node operates and even have a good success score,
but dashboard thinks it has 5TB free, but in fact it has 900GB,

is there a safe mechanism regardless of what local dashboard says?
Will the node stop filling before the real space ends?
Or data will “fall off the cliff” and node gets disqualificated?

(if You want, You can make it public, in right thread, i don’t know where to post it.)

Alexey · June 14, 2024, 3:51am

The node should report as full as soon as there not more than 5GB left (earlier this threshold was 500MB).

Mitsos · June 14, 2024, 3:59am

I haven’t seen that exact discrepancy on my nodes, when things fall out of sync (data on dashboard != actual data on disk) it’s always the opposite: the node thinks it has less free space than what is actually on the disk. In this case the node behaves as @Alexey said, traffic stops when the node approaches ~5GB (seen it as low as 3.5GB) and the disk is left with free space that can’t get filled.

A run of used-space fixes it ofc.

jammerdan · June 14, 2024, 4:19am

If it runs and finishes successfully…

Alexey · June 14, 2024, 7:37am

hm, I meant something different: it will reporting to the satellites that’s full, when it has less or equal than 5GB in the allocation/disk.

Mitsos · June 14, 2024, 7:40am

My bad on the wording. Yes that’s what I meant, when it’s at 5GB free it reports it to the satellite. Sometimes this overshoots (understandably) and it goes as low as 3.5GB.

Will edit the values

Ruskiem · June 14, 2024, 12:32pm

oh? i see. i hope it really keep track of real disk space left, and do reports, im going to see for real if thats true i guess

redgyuf · June 16, 2024, 10:11pm

Is this 5GB configurable? As it seems much more reliable than the 5-6days long filewalker in somecases

Alexey · June 17, 2024, 4:22am

No, it’s hardcoded.
And I didn’t get how is it related to the filewalker?

redgyuf · June 17, 2024, 6:57am

for me it took now about 6 days for the lazy walker to finish, and before this i restarted the node for config change, with the restart I lost more than a TB of used space on the dashboard, so even if i set the max space usable in the config to 7.6TB, my usage from 6.8TB dropped to like 5.7TB.

And with this much difference in the real usage I hit the real freespace limit like 2-3 days before the node finished its filewalking.

So the 5GB hardcoded limit worked flawlessly, while the filewalker is unreliable when we get like TBs of ingress/day, with the usual 30-60GB, basicly nobody cares and because there is also a low load caused by the ingress its finishes faster.

Alexey · June 17, 2024, 7:03am

The filewalker may not succeed if you also have a databases related errors in your logs.

redgyuf · June 17, 2024, 7:50am

maybe, but still it is unreliable/slow under heavy load, and the phisical free space checks seems to be better as a failsafe, especially if the disk is dedicated to storj and the DB is moved to like an SSD in my opinion.

Alexey · June 17, 2024, 7:54am

You may disable the lazy mode, then it will finish faster (however, it would likely affect the node’s success rate).

jammerdan · June 17, 2024, 8:39am

Storj has become aware of this too luckily: Storage node performance for filewalker is still extremely slow on large nodes · Issue #6998 · storj/storj · GitHub
So there is hope they are working something out.

In my opinion we need to get rid of filewalkers and databases as much as possible.

Alexey · June 18, 2024, 7:17am

Yes, but nobody suggested a better solution for that so far.
Databases are used as a cache (to do not scan constantly), the filewalkers should collect pieces, the garbage, empty the trash and remove TTL based pieces.
There is also used-space-filewalker to keep databases updated if there was issues with the databases. But it also requires to be able to update databases. The loop is closed…

We are investigating a usage of a badger in hope, that it can solve performance issues with the plain storage… But nobody wants to avoid databases (because they are speedup things a lot)…

jammerdan · June 20, 2024, 5:07am

I am not a developer so I cannot make suggestions how to this can be achieved. But the flaws and conceptual problems of the current implementation are getting obvious more and more every day.

We have seen at least ideas that try to reduce or optimize the pressure on the storage.

One of the best solutions seems to not use the databases at all. It was suggested many times to make them optional or at least as many of them as possible as an unknown number of SNOs have no use for the data they hold.

I believe that is a suggestion.

Other suggestion is to move as much of it into RAM. Running the databases not on the slowest storage but on the fastest available, seems logical to me.

Other suggestion is to separate actual data from history to keep the working databases and data loss in case of corruption small.

And if Storj is running out of ideas, they should talk to the developers of Sqlite:

Maybe they have additional ideas or can implement something that helps the Storj use case.

Better solutions for the filewalkers have been made to not to run them on every restart and make it configurable when to run them. Also saving state has been suggested and of course helps when it does not restart from the scratch every time.
Some of this is already getting implemented but for some reason not rolled out as fast as it should be.

Alexey · June 20, 2024, 7:11am

But I believe that you run your node on a RAM constrained hardware?

this is actually a great one. Do you have a post for that to share with the team?

I think it’s in work already. But not like as described… more like a resume of the interrupted process. It’s not merged so far.

jammerdan · June 20, 2024, 7:51am

I am running more than one node. Some are on constrained hardware like the Odroid HC2 others are not.
It would be a blessing to run the databases fully in RAM with only occasionally backups to disk like every 10 minutes.
Together with the separation of history data, I would lose only the last minutes in case of power loss or corruption.

No sorry, this is an idea I came up with and I had voiced it in the past her one the forum already. How such could implemented, I don’t know. But of course it would make sense to move old data out of the way to some kind of “archive” database because access to it is not frequently needed and data will not be changed.

Unfortunately not, right…

Alexey · June 20, 2024, 8:43am

perhaps more. The interval is 1h by default.