Short-lived data

Gasp0de · May 29, 2022, 10:07am

Hi there, I have a relatively new node (02/2022) and the amount of data on my node is stagnating because all the data that I get is relatively short lived. Is anyone experiencing something similar?

For comparison: I am seeing 10-15GB of ingress traffic every day, but my stored data only increases by 15-20GB/week because every week about 10% of the data stored on my node is moved to trash. So only 10-20% of the data that goes to my node is actually stored for a longer time.

Maybe other people with newish nodes (2022) can post data to compare, I guess older nodes still have a lot of test data which is stored but not deleted.

BrightSilence · May 29, 2022, 10:26pm

That is not normal behavior. I’ve been testing a few things with newer nodes (mostly vetting progress and how egress and deletes relate to recently uploaded data). Here is a screenshot of 2 nodes that started about 2 weeks ago.

You can clearly see that almost all ingressed data still remains on the nodes. Now this is slightly better than what I’ve seen in the past. And generally I have found that about 10% of data ingress ends up being deleted in the same month (most probably shortly after uploading). This is based on observing several nodes over recent months at different locations (though not globally spread).

Did you run the successrate scripts to see if you have a lot of cancellations and/or failures?

deathlessdd · May 29, 2022, 10:50pm

That dashboard doesnt show how much trash a node has though so that isnt a really good representation of how long the data lives for. Plus those nodes are still vetting.

BrightSilence · May 30, 2022, 6:42am

Fair point on the trash. With that in mind, you lose about 20-25%. Still nowhere near the 80-90% suggested by @Gasp0de. That really isn’t normal.
Yes these nodes are still in vetting, but that really doesn’t matter. They get the same data, just less of it. And since I’m talking percentages, those will apply to older nodes as well. Older nodes will also have about 1% of long term data deleted each month though, making it a lot harder to differentiate between what new data is being deleted and what old data is being deleted. This was the whole point for my tests for which I created these new nodes.

Gasp0de · May 30, 2022, 7:41am

I think you misunderstood me, it’s not necessarily new data that is getting deleted (but all my data is newish since the node is only running since february). I have 480GB stored, am getting 10-15GB ingress per day, so 70-105GB per week. My trash is 55GB so within the last week 55GB have been deleted while only 70-105GB have been added.

BrightSilence · May 30, 2022, 8:34am

Right, so you’ve included deletes from data already stored. But that should only be about 0.8-1% per month on average. with 480GB stored that should be maybe a few GB per week, so that doesn’t explain the difference either.

Gasp0de · May 30, 2022, 9:09am

The success rate script shows success rates above 99% for everything, 100% for audits, so that shouldn’t have anything to do with it. Probably i just get data from people who don’t keep it long? I also have comparatively high egress traffic, about 20% of stored data per month.

My only explanation is that there is a lot of onboarding with new people testing the service and then deleting their data again.

BrightSilence · May 30, 2022, 10:40am

I wish that were true… but I’m really doubting this and have been for a while. Since every upload starts 110 pieces but only keeps 80, you would expect 80/110~=73% to be the average success rate we see. But everyone has scores higher than 95% and almost everyone has scores above 99%. So something really isn’t adding up there.
What I think is actually happening is that the transfer finishes on the node end, but the piece still isn’t part of the fastest 80. Meaning it won’t end up in metadata on the satellite end and will be cleaned up during garbage collection.
We’ve kind of known this for a while but assumed that when comparing numbers it would still give some indication. But at this point I’m really starting to doubt whether it still has any use at all.

I’ve looked at my oldest and largest node. That node still grew by around 500GB this month. And with 313GB normal ingress and 353GB repair ingress, that again matches with a 25% loss based on ingress and 0.8% loss based on static storage.

0.8% * 17TB stored = 136GB loss
25% * 313GB ingress = 78GB loss

313GB + 353GB - 136GB - 78GB = 452GB growth
This is even slightly lower than actual growth.

Now we should keep in mind that nodes don’t all behave the same anymore now that geographic restrictions can be used. But these differences still seem way too big to me.

Toyoo · May 30, 2022, 5:05pm

I’ve recently noticed that more than half of all my stored chunks fit in a single logical block (4kB, ie. the smallest possible in terms of blocks filled on ext4), and even more of them are only slightly bigger. I suspect that it would quite unusual for a chunk of this size to still be somehow cancelled, as it might fit a single UDP packet. I hence suspect that there must be a lot of uploaded messages that end up still discarded by the system.

Alexey · May 30, 2022, 6:51pm

It’s already here. We have implemented the server-side move and server-side copy, include GatewayMT. So no actual transfers client-side in such cases and no re-uploads.
see Changelog v1.54.2

SGC · June 2, 2022, 10:07am

we also have to keep in mind that the amounts of deletes and such can vary wildly…

sometimes i’ve seen almost 50% of all ingress being lost while other periods it seems like it near max possible of lets say 80-90% on avg…

ofc this is a bit of a tricky thing, because if you get 10GB ingress in a day with 50GB stored and 20GB being deleted, your ingress is 10GB but your capacity increase is negative 10GB

and this even seems to apply on larger scales, most likely because customers will delete chunks of data, like say a big enterprise backup at one time, which might affect the results over longer periods…

like say many enterprise customers might have a monthly or quarterly backup schedule, meaning around dates like today (start of Q3), we might see slow rises in capacity used on nodes, even with high ingress going in.

due to the fact that customers get ride of old backup’s, however if they upload a backup of similar size, then long term we just see deviations over time… like say an increase in stored data leading up to early Q3 and then when those are done, they will start deleting the old backups, thus greatly reducing the perceived, stored ingress…

i don’t really track my deletions or deletion ratio… i know i should it sure would give me a better view of whats going on…

but i know my actual capacity used increases in relation to actual ingress, is something that seems very erratic, tho often i see that over longer periods of weeks or even months, but that might just be an informational bias because i don’t really track the day to day activity much any more.