The trash is unpaid?

jolmando · June 23, 2024, 9:17pm

with such garbage and prices. Let’s see what inflation will tell us in 2025)

Ruskiem · June 23, 2024, 9:18pm

invalid.

My node can delete all files instantly.
My trash housekeeping can be instant, its just landlord forbids me to do so.

Sure at first, we are helping Storj grow, but over time, this has to be addressed.
Because there is no free lunch, someone always has to pay to make it.

Mitsos · June 23, 2024, 9:21pm

That’s the whole point of our (=the side that says that something needs to be done about trash) argument is: we aren’t getting compensated for 100% of what we provide.

Example: I start a node and data keeps piling up.
1st month: 1TB total data stored, 100GB trash
2nd month: 2TB total data stored, 200GB trash
3rd month: 3TB total data stored, 300GB trash
…
8th month: 8TB total data stored, 800GB trash
9th month: 9TB total data stored, 800GB trash
10th month: 10TB total data stored, 800GB trash.

See the discrepancy? from month 9+ the bloom filter isn’t big enough to account for the extra data. Our compensation starts dropping from 100% down to 99% (X data stored but not paid for since the client deleted it, it still sits on the node though) to 98% and so on.

That’s the exact behavior I described above for my nodes. When the bloom filters grew, trash began piling up. Understandable, the free tier data was deleted, ok cool. Next bloom filter comes and the collected trash is almost as much as the first time. Old inactive accounts were removed, ok cool. Next bloom filter after that, the trash is still the same as bloom #2 and #1. And the next bloom filter, and the next bloom filter after that as well. That means that more than 10% of the node’s reported “used” space is occupied by trash. The bigger the node and the older it was, the more severe this is. I can’t understand 500GB each bloom filter for saltlake for example. I can understand a 1TB for the first bloom filter and 100GB for each after that (that’s the 10%, which should actually be going down each time but I digress).

All of this isn’t just guessing or watching for planets to align: the actual (=received in wallet) payout became smaller the more data I am storing, because uncollected data keeps piling up.

Personally I don’t advocate for getting paid for storing trash. But something needs to change about the current trash situation and IMNSHO shouldn’t be answered with “it’s not a priority right now”. Either get the satellites to send out bloom filters more often (since they are too small for all data), increase the bloom filters, or migrate to a different system for maintaining trash.

ACarneiro · June 23, 2024, 9:29pm

I thought they had, hadn’t they?

Mitsos · June 23, 2024, 9:30pm

They are still too small.

Roxor · June 23, 2024, 9:32pm

You are getting compensated 100%. There may be bugs in the trash system. There will always be ways to improve it. But what determines what is customer data that you get paid for has always been determined by the satellites: never by what’s on a nodes disk. The millisecond a customer deletes a file Storj stops getting paid for it, and SNOs as well. It instantly becomes useless data: and it’s part of the housekeeping routines (bloom filters, garbage collection) to eventually get rid of it. Which, yes, could take over a week

And the entire trash system, and the space it takes, has always been just part of running a node. You don’t get paid $1.5/TB/m for the node installer footprint. You don’t get paid $1.5/TB/m for whatever the databases files use. You don’t get $1.5/TB/m for Docker logs. And you don’t get paid for trash. All that is just part of running the software.

I agree trash still need to be improved. I’ve seen dramatic improvements with 1.104.5, but it doesn’t seem to be running like clockwork yet. And I did go looking for some old trash files that were hanging around (and found some). But I certainly appreciate the improvements so far!

Mitsos · June 23, 2024, 9:34pm

If you are storing 10TB and 8TB of that is uncollected trash, how much are you getting paid for, 100% or 20%?

Roxor · June 23, 2024, 9:37pm

The satellite is tracking 2TB of paid data. Storj is getting paid for 100% of 2TB. I’m getting paid 100% for storing those 2TB…

… and I may cry in the forums about trash bugs, or my potato node, or that I’m running a weird config

Mitsos · June 23, 2024, 9:38pm

I agree with you. Fast forward 6 months. The situation is still the same, 10TB stored, 8TB uncollected trash. Your disk is 10TB so we agree that for the past 6 months you only got paid 20% of your allocated space, correct?

Roxor · June 23, 2024, 9:45pm

I got paid for 100% of the 2TB of non-trash data. Yes. And, guessing your next question…if six months later I’m still storing 2TB…then I’m still getting paid for the 2TB

And I’m probably in the forums, asking when the trash bugs will get fixed. Or having Alexey tell me not to use virtualization or something.

I may be unhappy with trash handling. I may even choose to uninstall. But for every second that passed the satellite knew I had 2TB of data that needed to be paid for. And my node held that data, and passed audits, and was paid for every byte. I was paid for 100% of the value I provided.

Mitsos · June 23, 2024, 9:50pm

In that case the 8TB of uncollected data isn’t my problem and I will not be adding any more capacity. If storj wants to increase their available space to the clients, then they can figure out a way to reclaim those 8TB.

Ruskiem · June 23, 2024, 9:51pm

Actually, all that talk cute. We are only burning keybords.
Coz now, files with TTL will delete itself quite instantly for a Storj standards.
And im finding that my VMs wasn’t a problem, just managed to get filewalker to finish in 1h instead in 10 days.

Roxor · June 23, 2024, 9:51pm

100% agree: that’s the only sane response!

Julio · June 23, 2024, 9:51pm

Conversely one should also understand that trash affects payments in the reverse, even more so; considering the entire concept of tb/m has a distinct advantage for storj vs. node accounting. Storj cuts their cost off instantly once sent to trash, but the node perpetuates the tb/m cost. It has to be fixed, or equalized to be fair; and if it’s not a cost of doing business, management will continue to abuse it. Plus, a tb/m rolling accounting cost isn’t going to dramatically affect their bottom line. If a node doesn’t respond to a first bloom filter, due to unavailability or whatever, then they (storj) simply wouldn’t account for additional stored trash, thereby prevent abuse. Even if your system takes days to enact full deletion, that delay would fault the node operator… the key is date/time bloom filter, operator can suffer the 10% inaccuracy as it’s ‘the system in place.’

Mitsos · June 23, 2024, 9:54pm

Good for TTL. I’m talking about the non-ttl data. If the GC filewalker finishes instantly and still doesn’t collect what it should collect?

littleskunk · June 23, 2024, 9:55pm

That example is invalid. The bloom filter size depends on the data that you are getting paid for. So if the satellite blieve there is only 2TB of data on your node the bloom filter will wipe the other 8 TB within a few days.

So you need to increase the numbers here to get to the maximum bloom filter size. And that means you will get paid the corresponding amount.

Mitsos · June 23, 2024, 9:57pm

What is the maximum bloom filter size that covers a suggested node per the documentation? (was it 24TB max node size?)

Roxor · June 23, 2024, 10:05pm

There is no ‘node accounting’. There is only satellite accounting, and the occasional flag from a node indicating if it still has space or not. The entire audit system is because nothing a node says about the data it stores can be trusted

But I get what you’re saying. It should be made more obvious to SNOs that the single-source-of-truth (that determines their monthly payout) is based only what the satellite believes should be paid for… and not what the SNO sees on their filesystem.

I remain optimistic that the worst of trash handling is behind us. Does anyone have a feel for what %-trash a properly working network may hover around? Maybe 5%?

Toyoo · June 23, 2024, 10:33pm

It’s easy to ignore this when the average lifespan of a piece is many months, or even years. It’s more difficult to accept this with pieces with TTL less than a month.

It depends on the number of pieces you keep, not on the total size. And it gets more tricky to estimate as the RS coefficients will change. However, assuming 2M pieces per TB, the optimal bloom filter would be 27 MB for a 24 TB node.

Mitsos · June 23, 2024, 10:36pm

I know.

Are we anywhere near that value? We were at 4MB no so long ago and last time I checked we are at ~14MB.