Updates on Test Data

Yea i get alot of data now also :+1:t2:

1 Like

Thank you for the continued open communication friend.

Would I love even more detailed and even more frequent information? Yes.
Is this already more than I can ask for? Also yes.

Keep up the good work

Is the deleting of old data working properly?
The earnings.py script for my node shows 29.65TB as ā€œuncollected garbageā€. IT seems that while the virtual disk stays full or near full, the amount of data that the satellite thinks my node should have is going down when the test data expires.

1 Like

Sounds like you have the same problem as me: https://forum.storj.io/t/how-do-you-solve-slow-file-deletion-on-ext4/27260

having the same thing happen over here.

If you are still on v1.105.4, then this is likely to occur. The collector, when deleting, does not update the used space on that version. iirc it was fixed in v1.107.

To correct you’ll need to manually update (or wait for rollout) to latest and then trigger a used-space calculation.

1 Like

That does not seem to be the case for me, unless TTL expired deletes are different from normal deletes. I remember when Storj deleted a lot of test data from Saltlake (in preparation for the current tests) it went smoothly, the disk space was freed quickly.
I restarted my node to let the used space filewalker run, hopefully it will find some space, but it still is concerning that the satellite reports that my node has way less data than there is on the disk.

Yes, I believe that this issue is not solved yet. My nodes have reports from the satellites that they should hold about 4.42TB from used 6.78TB.
But, they are still running a GC.

1 Like

To my amazement: it looks like your team is squeezing out even a bit more speed over the last few hours. You must be reaching the limits of many SNOs internet connections: congrats!

Certainly you’ve exceeded potential-customer expectations: and now you’re just showing off :wink: . Perhaps a new sales whitepaper soon showing how Storj’s S3 speed kicks Amazon in the :peanuts: ?

None of this is a complaint: just a comment: we’ll reserve all the capacity you can afford :money_mouth_face:

1 Like

It’s actually been quite ā€œmehā€ for me.
Maybe I have too many full nodes now…

I would rather see them reaching the limits of my disk space. This test data is going as fast as it comes, so a lot of wear for no gain.

5 Likes

Please pay attention to the warning most likely displayed underneath that top overview. Recently it has been very common for the last used space report from the satellites to be incomplete, which unfortunately means that the uncollected garbage calculation is likely inaccurate. I hope that issue gets resolved soon, but I can only report what the node knows and if it gets incomplete data from satellites, that report is just going to be wrong. Maybe it’s not as bad as it looks. That said, I have quite a few nodes for which that graph just visibly shows a drop even on days where it is complete. It seems to me that expiration deletion is for some reason not catching everything and GC is still way behind on Saltlake.

4 Likes

I was thinking about that recently. Lets see if I can get the thought out of my head in a way that makes sense…

It seems like ā€˜old’ SNOs were able to fill a disk then park it on a port… and start a new disk… then when full park it on a port etc… basically always having one node growing… and the others idling. That worked because data tended to hang around longer… so idle/full nodes didn’t delete much: so it was no big deal if they shared an IP with the growing node. They didn’t need much ingress to stay full.

Now though… full nodes that are switched to share an IP… are still going to lose data at the full TTL rate: but not refill it as fast (as they’re now sharing a /24). So would newly-filled nodes ā€˜leak’ a little the first month they share that IP? I guess it would balance out eventually.

But… since underneath all the TTL data, there’s still some natural growth of long-term data… would that mean ā€˜old’ nodes are even more valuable? Like they’d be filling a larger and larger percentage of their space with that long-term data… so a freshly-filled node may be 90% TTL data… but a 3-year-old full node may only be 10% TTL data?

If that’s true… then it’s more important now to make your nodes a bit more durable. Because if you lose them it’s not just refilling (which may be quick with TTL data)… it’s refilling with that long-term data (which reduces the TTL churn the node has to deal with).

Does that sound right?

5 Likes

I saw the warning, that’s why I wrote here to ask others about the problem. It may or may not be accurate, but if the test data was uploaded with 30 day TTL then I think it is at least somewhat accurate.

My node got almost zero ingress in the last two weeks, because it was full. So, if the test data was uploaded with 30 day TTL, then about half of it should be expired by now.
However, the used disk space does not reflect that (the graph is of the actual used space, like what you get with df):

From the peak of 48TB it’s now down to 44TB (and the node still thinks that it’s full because of the other bug that can be temporarily fixed by restarting the node and letting it run the filewalker).
So, I think that the ā€œuncollected garbageā€ value is more likely to be correct than not, especially since it’s growing.

How do you get the value of ā€œamount of data the satellite thinks my node hasā€? From the database of API? I could create a graph for it.

It’s directly from the database. The API already does some calculation to summarize per day, which can compound issues. Feel free to have a look at the code in the script. I have to do some calculation since each report just shows bytehours over a variable period.

With the new traffic pattern and node selection the number of IPs is not as important as it was in the past. Calculate how much download traffic your internet can handle per month to get a rough estimate of how much storage you can reach. There might be still some long living data but I can see also a lot of none TTL data not even living 30 days.

2 Likes

If this is a single node I would guess it needs a 100MB+ bloom filter from US1 to have a decent false positive rate. What is the max size you have received so far?

Where can i read about this new node selection ?

Search for ā€˜node selection’ in this topic, and check out the posts by littleskunk. They’ve been updating us on the different selection criteria being tested by the devs.

1 Like

The same bug also affects any kind of scripts you are running. It is not the satellite side that is wrong. The storagenode has incorrect numbers of what it believes it has on disk. In your case the node is doing all the TTL cleanup but without updating the numbers. On disk your node is shrinking while the scripts suggest you would have uncollected garbage.

1 Like