When will "Uncollected Garbage" be deleted?

Ambifacient · August 3, 2024, 4:11pm

If you go scroll up, click “choose your satellite”, then select US1. I think you’ll see 0 reported disk space for that satellite on Aug 1.

Either way without reliable reported disk space from the satellite, why you may or may nor have so much uncollected garbage is hard to determine.

I personally am hedging towards satellite reporting issue. I have more confidence that the bloom filters are properly retaining the “actual data”.

donald.m.motsinger · August 3, 2024, 4:11pm

SL reported 1.64TB on 1st of August and it’s safe to assume that it’s roughly the same amount today. But the blobs folder has 12 TB of pieces in there.

BrightSilence · August 3, 2024, 4:16pm

The difference is because for the graph it calculates daily numbers and averages those. I just calculate based on the reported days and divide by the range covered in those reports. So if a satellite didn’t report about the last two days, those won’t be included. It’s more accurate, but completely missed historic time ranges as well as incomplete reports still make my method unreliable as well. Again, not much I can do about bad input data for the tool.

donald.m.motsinger · August 3, 2024, 4:18pm

See my reply above.

There are no “unreliable reports”. When a satellite reports, then it reports correct data.

Selection_345

This is the report from saltlake. You can see 1.64TB on 1st of August. Stop gaslighting me!

Ambifacient · August 3, 2024, 4:24pm

You don’t actually know this is true.

donald.m.motsinger · August 3, 2024, 4:36pm

I’m pretty sure that @Alexey would have said it if that would be the case in the thread mentioned earlier.

Also Two weeks working for free in the waste storage business :-( - #218 by Alexey

pasatmalo · August 3, 2024, 4:48pm

I don’t believe this is actually the case, at least not always. I have seen plenty of times usage reports received by my node that are way off (e.g. US1 reports a 99% drop in one day) and usually (not always) get corrected later on by other reports. Note theses are not non-received reports, rather reports with incorrect numbers.

While I got no idea what is the cause, maybe the task in charge of doing the tally stops early for some reason, for me it is clear that reports can be unreliable.

pangolin · August 3, 2024, 4:56pm

If I do the same for the “Payout Information” page, there is also no payment for storage.

Roxor · August 3, 2024, 8:41pm

So when this thread started I saw responses that essentially said “when I fill my disk I’ll expand”…

…but now many SNOs have filled a disk… but their node doesn’t think it’s full… and the satellite doesn’t think it’s full… so they’re not being paid as if it’s full…

…but the OS says it is full? Nice

ACarneiro · August 3, 2024, 8:54pm

It’s all a bit of a mess right now. Makes it very difficult to make any proper decisions regarding resource allocation.
Seems like a bit of a game of whack-a-mole.

Oh, well. I’m sure better days will come.

Mitsos · August 3, 2024, 9:39pm

Nope.

The disks aren’t getting full. On the contrary (and that is what I’ve been saying all along) they are emptying. Less and less actual paid data is being stored. Not uncollected garbage, the data coming in is less than the data that is being deleted. After what, three months of huge ingress, I still have about 20TB waiting to be filled (as reported by the OS, not any pretty graph, nor any script).

Which brings us back to the uncollected garbage that everyone is complaining about: if the node thinks that it is full, that is what is being reported to the satellite (=do not send me any more data, I can’t store it). Some SNOs have worked around this by setting their allocations to an extreme number and just restart their node when they need more space. Those (ten or so) of us that are still playing by the rules and have actual allocations that leave a little of “reserved” space (=10% extra space, which isn’t workable on large disks, because we are talking about a couple of TB just sitting there), are constantly limited (=no new data can be received) because our nodes have been reporting as full (=the near hysteria when “3000 nodes were filled in a couple of weeks”) hence we aren’t getting the ingress we were supposed to be getting.

There are two different issues that have surfaced:

Uncollected garbage (=missed by blooms, not deleted by expiry)
Low ingress because even though we were promised that the “floodgates from Hell would be opened and that there wouldn’t be enough disks being produced to cover our needs”, ingress just barely covers the deleted data.

Both of those result in the following TL;DR: disks aren’t getting filled. No need to add more disks.

Disclaimer: This reply may contain traces of sarcasm and hyperbole. Reader discretion is advised.

BrightSilence · August 3, 2024, 10:07pm

This is not true. I’ve seen incomplete reports in the past.

Edit: Since I like to bringing receipts for my claims. See: EU1 sudden drop in storage usage reported by satellite (Not just the graph display issue this time, database shows unusual low data usage in the last report from the satellite)

kocoten1992 · August 4, 2024, 12:13am

Do you know how Satellite calculate “Uncollected Garbage”, I try to look for it in storj/satellite at main · storj/storj · GitHub but don’t have any clue yet, I’m interested in how it inner work to try to improve it.

Roxor · August 4, 2024, 12:18am

I think the satellite only tracks paid data (so can create the average, most days)… and that the script compares that to the space-used on disk. Then anything it can’t account for it guesses is ‘uncollected garbage’?

BrightSilence · August 4, 2024, 12:40am

Satellites only report space usage based on pieces your node should have. The earnings calculator compares it to what the node says it is holding in blobs storage. Both numbers can be inaccurate. Satellite reports can be incomplete and the nodes own accounting of local storage can be wrong due to bugs in the software and/or disabled used space file walkers.

It’s not a guess. If Storj fixes the issues with reporting data on both the satellite and node end, what remains is by definition uncollected garbage. Give or take a small mismatch between when the last storage usage was calculated on the satellite and the time the calculator runs. This should at most be a day (IF everything works as it should). This is why it can show a small negative number of uncollected garbage.

It’s not rocket science. The satellite says “Your supposed to have x amount which is what you’re paid for”. The node says “I’m storing y amount in blobs”. y - x = uncollected garbage.

Unfortunately recently both the node and the satellite have frequent issues with tracking and reporting that information reliably. That’s not my problem to solve. I’d rather keep showing the calculation so those bugs are obvious. If that means more SNOs complaining, all the more reason for them to fix the reporting. I’ve included several warnings when there are signs that data may not be reliable. That’s the best I can do.

kocoten1992 · August 4, 2024, 12:50am

Hi there, how does does the satellite calculate report? I mean does it take a snapshot at the database and then start to “select sum(*)” - I’m trying to dig as deep as I could, and in hope even if can’t solve it, it at least reveal how inner satellite report work.

kocoten1992 · August 4, 2024, 1:45am

No kidding storj/satellite/satellitedb/storagenodeaccounting.go at 6f4de2f66e9e060dd2bb70e3b8108be0efa6f534 · storj/storj · GitHub

Alexey · August 4, 2024, 4:05am

I never said that the reports from the satellites are always correct. Only payout is always correct because of reason, which you quoted.
By the way, I have a suspicion, that the report from SL on Aug 1 is incorrect. Because the blob for it on my nodes is much greater than it’s reported. And the almost daily BF didn’t clean too much from it.

Alexey · August 4, 2024, 4:10am

What you can see on the Payout information page for the current period is estimation based on the existing local data, not a payout. It’s not provided by the satellites daily, only monthly when the payout is completed (roughly after the first two weeks of every month).
The payout you can see for a previous period and in the payout history.

jammerdan · August 4, 2024, 5:05am

SNOs need reliable and accurate numbers for proper decisions. Is anybody surprised?