Disk usage discrepancy?

Alexey · March 11, 2024, 7:48am

It’s not used for that. The usage is accounted all the time until the customer remove it (or it will be automatically removed if it has an expiration date).
This interval is needed only for the storage node to build a smooth graph for usage, to do not have spikes as before.

jammerdan · March 11, 2024, 8:14am

Are you sure? Because it seems that at least it gets used for the display of estimated payout.

Maybe I am wrong but here is what I am looking at:

For one node one satellite gives me:

{"atRestTotal":22413732222591.3,"atRestTotalBytes":1400858263911.9563,"intervalInHours":16,"intervalStart":"2024-03-10T00:00:00Z"}

16 x 1400858263911.9563 = 22413732222591,3008

So atRestTotal includes the interval.
If I add up atRestTotal for each day this gives me what gets displayed as storageSummary.

If I sum up all storage summaries from all satellites I get what is displayed as /api/sno/satellites storageSummary.
And this value divided by 720 is the diskSpace in estimated-payout that gets multiplied with $1,49 / TB / month resulting in the diskSpacePayout.
So to me this looks like the interval is part of the calculation at least for displaying the estimated payout. I don’t know if the real payout calculation differs from that.

Alexey · March 11, 2024, 8:31am

You are correct, it’s likely used for the payout estimation too. At the end both calculations should match, but there could be a small difference anyway, the satellite calculates it more precisely.
You may compare it with data from paystubs method (see Storage node dashboard API (v1.3.3)). Please note - it got updated only when satellites sends receipts after payout is done.

jammerdan · March 11, 2024, 8:56am

Then it would be nice if those estimate values would get updated after a while by the actual numbers of the satellite.
I don’t know how it is done on the satellite side. But lets say for example if we are actually on the 5th and the satellite has actual values for 1st and 2nd day, it would be terrific if the node would receive the actual data for those days.

Vicente · March 11, 2024, 10:45am

I get a lot of deletion errors.

How do I have to remove them?

C:\Program Files\Storj1\Storage Node\storagenode.log:5133290:2024-02-24T23:12:47+01:00 WARN retain failed to delete
piece {“Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Piece ID”:
“ZWU5KTATT5U565GPPDIESWIW2OEOQKIQRD5OCQESDJQ5CZZ7EFIQ”, “error”: “pieces error: filestore error: file does not exist”,
“errorVerbose”: “pieces error: filestore error: file does not exist\n\tstorj.io/storj/storagenode/blobstore/filestore.(
*blobStore).Stat:110\n\tstorj.io/storj/storagenode/pieces.(*BlobsUsageCache).pieceSizes:245\n\tstorj.io/storj/storageno
de/pieces.(*BlobsUsageCache).Trash:290\n\tstorj.io/storj/storagenode/pieces.(*Store).Trash:404\n\tstorj.io/storj/storag
enode/retain.(*Service).trash:373\n\tstorj.io/storj/storagenode/retain.(*Service).retainPieces:341\n\tstorj.io/storj/st
oragenode/retain.(*Service).Run.func2:221\n\tgolang.org/x/sync/errgroup.(*Group).Go.func1:75”}
C:\Program Files\Storj1\Storage Node\storagenode.log:5133431:2024-02-24T23:14:00+01:00 INFO retain Moved pieces to
trash during retain {“Deleted pieces”: 2384099, “Failed to delete”: 2099, “Pieces failed to read”: 0, “Pieces count”:
30584389, “Satellite ID”: “12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S”, “Duration”: “41h45m11.4073337s”,
“Retain Status”: “enabled”}
C:\Program Files\Storj1\Storage Node\storagenode.log:5318350:2024-02-25T12:54:28+01:00 INFO retain Moved pieces to
trash during retain {“Deleted pieces”: 194287, “Failed to delete”: 0, “Pieces failed to read”: 0, “Pieces count”:
2727255, “Satellite ID”: “12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs”, “Duration”: “5h3m24.5908771s”, “Retain
Status”: “enabled”}
C:\Program Files\Storj1\Storage Node\storagenode.log:6517523:2024-02-28T11:13:48+01:00 ERROR pieces:trash emptying
trash failed {“error”: “pieces error: filestore error: context canceled”, “errorVerbose”: “pieces error: filestore
error: context canceled\n\tstorj.io/storj/storagenode/blobstore/filestore.(*blobStore).EmptyTrash:176\n\tstorj.io/storj
/storagenode/pieces.(*BlobsUsageCache).EmptyTrash:316\n\tstorj.io/storj/storagenode/pieces.(*Store).EmptyTrash:416\n\ts
torj.io/storj/storagenode/pieces.(*TrashChore).Run.func1.1:83\n\tstorj.io/common/sync2.(*Workplace).Start.func1:89”}
C:\Program Files\Storj1\Storage Node\storagenode.log:7104616:2024-02-29T20:55:10+01:00 INFO retain Moved pieces to
trash during retain {“Deleted pieces”: 10, “Failed to delete”: 0, “Pieces failed to read”: 0, “Pieces count”: 54015,
“Satellite ID”: “1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE”, “Duration”: “1m29.4936205s”, “Retain Status”:
“enabled”}

daki82 · March 11, 2024, 11:53am

nerdatwork · March 11, 2024, 1:28pm

Its a warning, you can safely ignore it.

Alexey · March 12, 2024, 4:09am

they are updated after payout, because the satellite calculate that for the entire month, not like a node day by day, this is why they may differ - the rounding error on the node’s side, and this is why it’s called estimation. Also submitting orders - they have an expiration date, so if node did not submit them for any reason (reinstalled Windows but forgot to save orders from the installation folder for example, node was offline, etc.), the estimation will differ even more.

P.S. it’s a heavy procedure for the satellite to perform it more frequently than once a month.

jammerdan · March 12, 2024, 4:26am

I see. So we have the nodes calculating “something” and the satellites calculating “something” and the nodes do not really get updated with actual data.

Not only where its called estimation. It seems like all node data are only kind of estimations and the reliability and accuracy of the data the node displays is on jeopardy. Like when the filewalker does not finish or bloom filters are to small or do not get processed or interval hours are missing hours…
It seems that we do not have any accurate data from the satellites a SNO can really rely on.

Alexey · March 12, 2024, 4:31am

It’s usually close enough, and some data is accurate on daily basis - the usage updated from the satellites, however, since it’s accumulated information, it become a precise only for the day, but not for the entire month.
You may see this discrepancy if you would use Earnings calculator (Update 2023-12-05: v13.1.0 - Now with support for different payouts per satellite - Detailed earnings info and health status of your node, including vetting progress) for the previous periods.

jammerdan · March 12, 2024, 6:30am

which one is that that come from the satellites?

Ruskiem · March 12, 2024, 7:26am

Well if satelites count all that, why to force node to exhaust itself for all Non-lazy filewalker-space-used couting? when it only slows the node hmmm.

I observe my nodes, few has lazy filewalker on startup only, they seems the healthiest.

i keep full filewalker on all other nodes because of fear, that it will not be paid accurately, but if that counting doesn’t really matters You say hmmm, because satellites count on its own regardless of what node will update them? and it only matter to satellites to know if node has any free space left or not it seems.

Soooo if i remember correctly: does satellite commands node on demand from time to time to run a used-space filewalker? Because if it do, its much better for the node when 1 satellite actually asks for this only when its needed, and only for 1 sattelite at a time, (and not for a node to count all satellites, every time it restarts) theres much better chance it will finish the job.

imagine 1 satellite needs a filewalker to kick off NOW, and the node just restarted and doing it’s full scan for all the satellites, and the one that really requested it, is 3 in queue?

In that scenario, it would be much better for a node to never do it on itself, but only on request like once per 30-60 days just to check.

Shall we permanently disable full filewalker at start then!? hmm…

Alexey · March 12, 2024, 8:16am

because the node should report does it have a free space to be selected for upload or not. And you want to see a correct numbers on the dashboard.
If the node will report that it actually doesn’t have a free space only when it’s already selected - the customer will have a failed uploads in a worst case or a significant slowdown of uploads.

no. Your node sends signed orders to the satellite to being paid for the usage and this usage is accounted on the satellite. It’s also accounted on the node, but as a delta from a previous known state. If it doesn’t crash loosing in-memory Stat, it should be pretty accurate. However, even a normal restart can lead to not updating databases due timeout. Here the filewalker should close this gap.
So, if your node is stable and disk is fast enough, you likely should not have a discrepancy.
It could be only for the cases when the size of the Bloom filter is not enough for the amount of stored pieces.

no, the used-space-filewalker is started by the node on start only, if it’s not disabled.
The satellite sends a Bloom filter, and the node initiates a gc-filewalker, then - retain for the collected garbage.

in this case it will lose the received Bloom filter and gc-filewalker will not be invoked, at least until this feature would be implemented:

I do not quietly understand what do you mean by “full filewalker”? If used-space-filewalker, then yes, you may disable it, if you do not have a discrepancy between used/free space and actual used/free space on the disk.
For discrepancy between used space and the “Average Disk Space Used This Month” you need that your node have successfully pass gc-filewalker for all satellites, retain for all satellites, collector for all satellites and piece:trash for all satellites.

jammerdan · March 19, 2024, 3:55am

I am still trying to find out which data is accurate, reliable and preferably comes directly from the satellite.
Which ones are these that you are referring to?

Alexey · March 19, 2024, 4:08am

The Average Disk Space Used This Month and Bandwidth Used This Month, I still would not say what’s endpoint for that or how to calculate properly, but I think this one:

The node’s local calculations are available on Payout Information page (SND) or Payouts (MND) in the top table.
I think this info is not available through API, only from local tables, but I may be wrong.

jammerdan · March 19, 2024, 4:49am

You are referencing the section of the dashboard right?
So these must be the same that are found in the API, correct?

So when I check those I still find that the online intervall hours come to play here: i take the total storage summary and divide them by the average and this results in (my case) 417 which is the sum the displayed interval hours.

So no, it does not seem like this comes directly from the satellites.

Alexey · March 19, 2024, 4:58am

Yes, I believe so.

There are some values, which I cannot understand, so this is why I’m not sure. I never tried to use them to calculate a usage.

jammerdan · March 19, 2024, 5:06am

And I would like to understand how accurate the information is I see to detect problems.

If I make a rough overall estimation for the payout and it differs from the estimated payout that the nodes show me, I would like to understand where the difference comes from.

Alexey · March 19, 2024, 8:35am

Usually the difference in the locally calculated values based on used egress and known to storagenode storage usage (not necessarily confirmed by orders) and the usage, confirmed by signed and sent orders.

elek · March 19, 2024, 10:30am

As far as I see, they are both based on file walker. (Storagenode section of this wiki: Debugging space usage discrepancies). Also see this comment: storj/storagenode/pieces/store.go at 52881d2db3732c07e3975237ff987e9f7c01bdf2 · storj/storj · GitHub (this is an estimation!).

If you are interested about the space usage, calculated by the satellite. I would recommend to query the sqlite database (see the mentioned wiki page).