Bloom dont work or why my trash is empty?

Mitsos · April 2, 2024, 7:17pm

Was it limited at the start of the previous month (hence I’m assuming that no data was added for a whole month) or was it limited yesterday? If it stays limited, you’ll see the two values gravitate towards each other (but NEVER match, I have explained why) as I have shown above.

00riddler · April 2, 2024, 7:27pm

It was limited the whole month.
Yes there are small deletions by the collector, about 100MB per hour. But these are filled up again within some minutes.

Mitsos · April 2, 2024, 7:32pm

Then next month’s average will go up.

00riddler · April 2, 2024, 8:31pm

That is what everybody told me last month… Still no change.

Mitsos · April 2, 2024, 8:47pm

We just had about 10% of the entire used space deleted. If your node showed 4.35TB on March 1st (used) and one month later on April 1st it stayed at 4.35TB, it would be the exception to the rule . You (and I, and all of us) lost 10%. 10% of 4.35TB is 0.435TB. Your average on this alone should be 4.35-0.44 (let’s round up)= 3.91TBm. Of course the 10% wasn’t the only data that was deleted on the previous month, others were deleted as well. I just say 10% because on bigger nodes it is very noticable to go from 1TB trash to 10TB overnight (yes I have nodes that had that, I went up 15TB trash overnight). I’ll add out of my own pocket a couple hundred GB more to your trash.

That brings us to about 3.5TBm. You can see (from my screenshot) that unless your node has been full for months (at least two) the average doesn’t even come close to the used.

If next month’s average doesn’t tick up (which I know it will because even though you haven’t posted a screenshot, I know that there is an upwards trend on your graph), then I’ll eat my words

00riddler · April 2, 2024, 9:14pm

Unfortunately there is no upwards trend for two months now because of the limit that i reached i think somewhere end of December. More or less i would say there is a downwards trend for the last two months.
I limited the node to 2TB just for scientific reason now and will keep it online for one more month just to see how it will react.

Ruskiem · April 2, 2024, 9:18pm

i mean wow, what a drama in the glass of water all that caused. im guilty too.
Over all We have to remember:

We are still the pioneers,

and our feedback help develop the very first working,
unique project that links home users data storage with
commercial applications!

In fact, this V3 version of storj, is fully in production only around 2nd or 3rd year now.
Means open to make money in win-win scenario, but they are polishing by the week!
I’m glad im part of this, and can provide feedbacks since 2019!

As far as i understand, the payout calculations Was right all the way, just the node wasn’t able to delete all the trash and that accumulated little bit. See: Current situation with garbage collection

So situation should get better from week to week,
Less discrepancy should be, from week to week from now on.
We can’t be any angry, or such, normal things to happens during the growth and evolution.

donald.m.motsinger · April 2, 2024, 9:52pm

Since this node is not full that 5TB would be free space normally and you would not get paid for it too. So there is no real loss. Only if the node would be full then you could have been storing real paid data in these 5TB.

jammerdan · April 3, 2024, 4:33am

This does explain how an average calculation works generally, but not where the values come from or why they differ so much.
I’ll try with a new actual image from the node dashboard for the same node that I have posted above:

What you see here is the following:

The graph itself shows the storage space used on a daily basis.
In my case this is 7.3TB on the 1st, 7.22TB on the 2nd and 3.84Tb on the 3rd.
The low value for today is because only 2 satellites have reported their values yet. It will change as soon as there is data from all satellites.
The result is exactly as shown: (7.3+7.22+3.84) / 3 = 6.12TB which is shown as avg of the month so far.
So the question is rather not the way is is calculated. The question is, where do the values 7.3, 7.22 and 3.84 stem from. This is where the right side comes into play, because the node counts 10.84TB in use.
I don’t have exact values from this node for day 1 and 2 but it is clear that when the month started, the calculated used space was above 10TB already.
So the calculation should be something like (10.5 + 10.5 + 10.84)/3 = 10.61TB as average.

I have no idea where the 7.XTB come from. If you look at the raw values from the API it is the value of atRestTotalBytes from /api/sno/satellites, which is the sum of the same value from each satellite. For example this value tells me that I had stored on the 2nd:

Saltlake: 312759119717
AP-1: 284760358716
US-1: 3508374303820
EU-1: 3109759534599

which sums up to 7215653316852 which is exactly 7,2156 TB or 7.22 TB as on the graph on the 2nd. But the sum should not be 7.22TB it should be somewhere around 10.5TB according the right side.

And this is the issue we are seeing currently.

Maybe 7.22TB are correct for payout because this is what the node should be storing. Might be. But then the question is, why it is in fact storing 10.84TB of data. And this is used space. It is not trash, it is not databases, it is not temp files as the filewalker does not count these.

Roberto · April 3, 2024, 5:04am

@jammerdan and the estimated payout? To me it doesn’t match either the left or right screen

jammerdan · April 3, 2024, 5:12am

Which one?
When I take the value from currentMonthExpectations and divide it by $1.5 as rough estimation (egress is rather negligible) I get around 7.4TB which seems consistent with the left sided values in both images I have posted.

Roberto · April 3, 2024, 5:19am

My estimate on the multinode dashboard is slightly lower than what both charts say

jammerdan · April 3, 2024, 5:21am

I would expect estimation of around 10.5TB * 1.5 = 15.75 based on the values from the right side.

Mitsos · April 3, 2024, 8:33am

There were a few versions that showed a dip either on the 2nd of the month or on the third. I’ve stopped paying attention to it, as far as I understand it has something to do with the way totals get calculated. Someone else needs to fill this gap.

Used space: This is NOT what the satellite thinks you have, hence what you should be paid for. It is purely and only a local calculation that is based entirely on the used-space filewalker completing correctly. All on the right: the total = used + trash + free - overused. If the used-space filewalker updates those values, they get updated. Otherwise they can stay out of whack for a long time. I’ve run a node with newly created databases (after a corruption) that is on an SMR disk (note to reader: don’t comment, I know) that never completes used-space filewalker. That node shows completely irrelevant values (ie zero). It can’t even start properly because it thinks that the allocated space is less than the available space (which technically it is since it doesn’t know how much data it has stored), unless I manually edit the minimum limit in the config. I don’t expect this node to ever display correct right values, unless we get filewalker progress saving.

Back to the topic in question: As I have already explained: The same way it averages up (ie store 10TB for half a month = 5TB/m) is exactly the same way it will average down: Storing 10TB for 0.75 (or 3 weeks) of the month = 7.5TBm

Your values are correct (as shown by the API). The left side is done satellite-side, it is the “paid out value” afterall, imagine if storj depended on everyone telling the satellite “hey, I need to be paid this much”.

You are storing 10TB (assuming your used-space filewalker has run correctly at least recently-ish). The thing is (and this is what I’ve been explaining) that you haven’t been storing it for a month. Yes the value yesterday was 10TB. Yes the value today is 10TB. That doesn’t mean that 9TB wasn’t deleted yesterday and put back overnight. That does not equal to 10TBm. That equals to 10TB stored for 2 days (since it was deleted on the 2nd of the month) = 0.66TBm (assuming 30 days/month). If that 9TB data that was put back last night gets deleted, the average will IN FACT GO DOWN. The average for that data is 0.6 (9TB stored for 1 day, assuming 30 days/month). YOUR NEW AVERAGE IS THE AVERAGE BETWEEN 0.66 AND 0.6, WHICH IS THE VALUE IN THE MIDDLE OF THEM, IE 0.63TBm. That same 9TB that will be deleted tonight gets replaced tomorrow with new data, you will wonder why the right side shows 10TB for three days straight and the left side shows 0.66TBm.

Again: The calculations work as expected, as far as I can see. The two values will NEVER be equal. They will NOT even come close to each other UNLESS the node sees ZERO new data in TWO months and ZERO data has been deleted from it for the past TWO months. Since this in fact impossible (not theoretically impossible, mathematically provable impossible), the left value is what you pay attention to for getting paid, and the right value is what you use to decide if the node needs to be expanded or a new node needs to be added. You never compare the two values against each other.

Mitsos · April 3, 2024, 8:47am

That’s why it needs to save its progress. If it completes 2% each week, that means in one year it will complete and the right values can display something instead of zero.

Mitsos · April 3, 2024, 8:52am

I would imagine that when the time for a filewalker to spawn comes, it would first check if another is running and delay.

Used-space filewalker is another story: it only comes into the picture when the node restarts. You can’t have two of them running at the same time.

Ruskiem · April 3, 2024, 8:53am

i mean if GC (trashman) cant finish in a week (7 days) then You have to fix that node and make it decent bro, or whoever this may be. A trash deleter has not much to do even if its 1TB to delete its nothing comapred to a whole disk normal filewalker scan.

Mitsos · April 3, 2024, 8:56am

That SMR node does in fact run used-space plus TWO GC concurrently. I have to go in and manually kill the two GCs. SMR disks are a cancer to the industry, I don’t know why nobody paid any attention to the hitachi engineers when they said this should be banned and all research on it should have been destroyed (circa 2015).

littleskunk · April 3, 2024, 11:40am

There is no need to kill it. Just run 1 GC at a time like this:

retain.concurrency: 1

d4rk4 · April 3, 2024, 11:48am

You just use SMR disks as wrong way =) It’s a good and cheap solution for a large ZFS setup with multiple RAIDZ VDEVs.