Discrepancy between B*h reported by satellites and storage used on node (impacting node payouts)

BrightSilence · November 18, 2022, 8:47am

In this topic I hinted at a discrepancy between the total amount of piece data stored and B*h used as reported by satellites. Initially I thought it might be part of the same bug, but I later found out this predates the latest version and also shows up in my earnings calculator.

I didn’t want to throw out accusations without digging deeper first, so I mentioned there I would create a new topic after looking into this more. I have now collected the data required to make a more complete post around this issue and I personally can’t figure out what is causing this fairly significant difference. Here are my findings.

For this example node, the new disk space used graph shows the problem more clearly, so I will be using this updated node for most of the data.

This node is using 2.78TB of disk space, but reports only 1.98TB*d on the most recent day.

I’ve made some adjustments to my earnings calculator to add numbers to quantify the discrepancy. Let me start with some definitions.
Disk Current: This is the sum of total in the piece_space_used table. It reflects the total amount of piece data stored on the node.
Disk Average So Far: This is the sum of at_rest_total (which is in B*h) from the storage_usage table, divided the number of hours passed in the current month. This reflects the average amount of data my node stored according to what the satellites have reported back.
expected: My own calculation of what the average should be based on Disk Current and ingress + deletes so far this month. Assuming roughly linear growth over the course of the month. The percentage shown displays how much of this expected storage the satellites actually report.

Here’s a screenshot from my adjusted calculator.

Only 68% of actual space used is accounted for by the satellites. Sometimes the reporting is a day behind, but at the moment that would only account for about at max 6% of missing data, not the 32% I’m missing here. And from the graph on the dashboard it is clear that all data for the 17th is already there and the 18th is about 8 hours old (UTC at time of screenshots).

I’ve looked into possible explanations, maybe trash is included, but that’s only 30GB, accounting for about 1% only. Possibly there is a difference between actual size of pieces and size on disk, because of sector size. But looking into that I see less than 0.5% difference there.

Note: I’ve blacked out parts of paths/commands that aren’t relevant

While the above node seems to be among the worst offenders, this is hitting other nodes as well. Some examples.

My largest node seems to be least impacted for now. Missing only about 7% which could probably partially be explained by satellite reporting running behind a little, though not entirely.

One last example. This node has been full for years and doesn’t really show the issue.

Note: I did not disable the file walker process and all nodes have been restarted due to updates in the past few days. So the total stored should be up to date.

If anyone from Storj Labs would like to look into this I can DM node ID’s, though it doesn’t seem isolated to specific nodes.

BrightSilence · November 18, 2022, 8:55am

This is the adjusted earnings.py if you want to check your own nodes.

Pentium100 · November 18, 2022, 9:01am

I have not updated my node yet (still version 1.66.1), but I can still check this

November 2022 (Version: 12.1.0)                                 [snapshot: 2022-11-18 08:58:28Z]
                        TYPE            PRICE                   DISK    BANDWIDTH        PAYOUT
Upload                  Ingress         -not paid-                      600.93 GB
Upload Repair           Ingress         -not paid-                       74.55 GB
Download                Egress          $ 20.00 / TB                    477.27 GB       $  9.55
Download Repair         Egress          $ 10.00 / TB                    317.47 GB       $  3.17
Download Audit          Egress          $ 10.00 / TB                      5.45 MB       $  0.00
Disk Current            Storage         -not paid-       24.19 TB
Disk Average So Far     Storage         -not paid-       23.24 TB  >> 97% of expected 23.93 TB <<
Disk Usage Month        Storage         $  1.50 / TBm    13.46 TBm                      $ 20.19
________________________________________________________________________________________________+

The dashboard:

588.19TB*h / 24h would be 24.5TB

littleskunk · November 18, 2022, 10:36am

I will try one last time to find someone that can write a good explanation. I know the root cause but I don’t feel like I am the best write that can explain it that well. In the meantime are you able to break it down by satellite? I would expect that the US1 and maybe the EU1 satellite contribute the most here. Maybe AP1 as well. The other satellites should have almost no impact.

BrightSilence · November 18, 2022, 10:38am

It would require a bit more effort, but I can take a look. For now I can say that the less impacted nodes are older and have more data from the test satellites, so that does match your suspicion.

BrightSilence · November 18, 2022, 10:46am

Ok, this looks ugly, but quick and dirty ways always do. It’s functional enough…

Mostly matches your expectation. But I see it on europe-north-1 as well. Just not as much.

Edit: I’ve updated the above post with this new version of the calculator.
@Pentium100 could you run it again as well. I’m curious if you see the same spread across sats. Since your old node holds a lot of test data as well.

littleskunk · November 18, 2022, 10:49am

Thank you very much. These numbers will help the team.

Pentium100 · November 18, 2022, 11:44am

us1 is the odd one out for me.

November 2022 (Version: 12.1.0)                                 [snapshot: 2022-11-18 11:43:49Z]
                        TYPE            PRICE                   DISK    BANDWIDTH        PAYOUT
Upload                  Ingress         -not paid-                      607.49 GB
Upload Repair           Ingress         -not paid-                       75.11 GB
Download                Egress          $ 20.00 / TB                    480.14 GB       $  9.60
Download Repair         Egress          $ 10.00 / TB                    319.91 GB       $  3.20
Download Audit          Egress          $ 10.00 / TB                      5.48 MB       $  0.00
Disk Current            Storage         -not paid-       24.20 TB
Disk Average So Far     Storage         -not paid-       23.09 TB  >> 96% of expected 23.93 TB <<
Disk Usage Month        Storage         $  1.50 / TBm    13.46 TBm                      $ 20.19
________________________________________________________________________________________________+
Total                                                    13.46 TBm        1.48 TB       $ 32.99
Estimated total by end of month                          23.09 TBm        2.54 TB       $ 56.60

Payout and held amount by satellite:
┌────────────────────────────────┬───────────────────┬─────────────────┬────────────────────────┬─────────────────────────────────────┐
│ SATELLITE                      │      NODE AGE     │   HELD AMOUNT   │        REPUTATION      │          PAYOUT THIS MONTH          │
│                                │ Joined     Month  │ Perc     Total  │    Disq   Susp   Down  │     Earned        Held      Payout  │
├────────────────────────────────┼───────────────────┼─────────────────┼────────────────────────┼─────────────────────────────────────┤
│ ap1.storj.io:7777 (OK)         │                   │                 │                        │                                     │
│                                │ 2019-06-10    42  │   0%  $   0.27  │   0.00%  0.00%  0.28%  │  $  2.0352   $  0.0000   $  2.0352  │
Disk Average So Far:   1.89 TB  >> 97% of expected  1.94 TB <<
├────────────────────────────────┼───────────────────┼─────────────────┼────────────────────────┼─────────────────────────────────────┤
│ eu1.storj.io:7777 (OK)         │                   │                 │                        │                                     │
│                                │ 2019-05-31    43  │   0%  $  17.93  │   0.00%  0.00%  0.32%  │  $  4.5196   $  0.0000   $  4.5196  │
Disk Average So Far:   2.95 TB  >> 96% of expected  3.08 TB <<
├────────────────────────────────┼───────────────────┼─────────────────┼────────────────────────┼─────────────────────────────────────┤
│ europe-north-1.tardigrade.io:7777 (OK)             │                 │                        │                                     │
│                                │ 2020-04-18    32  │   0%  $  34.89  │   0.00%  0.00%  0.19%  │  $  8.8290   $  0.0000   $  8.8290  │
Disk Average So Far:   8.78 TB  >> 97% of expected  9.02 TB <<
├────────────────────────────────┼───────────────────┼─────────────────┼────────────────────────┼─────────────────────────────────────┤
│ saltlake.tardigrade.io:7777 (OK)                   │                 │                        │                                     │
│                                │ 2020-02-11    34  │   0%  $  35.89  │   0.00%  0.00%  0.21%  │  $ 12.3518   $  0.0000   $ 12.3518  │
Disk Average So Far:   7.51 TB  >> 97% of expected  7.71 TB <<
├────────────────────────────────┼───────────────────┼─────────────────┼────────────────────────┼─────────────────────────────────────┤
│ us1.storj.io:7777 (OK)         │                   │                 │                        │                                     │
│                                │ 2019-03-16    45  │   0%  $   1.42  │   0.00%  0.00%  0.27%  │  $  5.2307   $  0.0000   $  5.2307  │
Disk Average So Far:   1.94 TB  >> 90% of expected  2.15 TB <<
├────────────────────────────────┼───────────────────┼─────────────────┼────────────────────────┼─────────────────────────────────────┤
│ us2.storj.io:7777 (OK)         │                   │                 │                        │                                     │
│                                │ 2021-01-07    23  │   0%  $   0.07  │   0.00%  0.00%  0.26%  │  $  0.0275   $  0.0000   $  0.0275  │
Disk Average So Far:  21.91 GB  >> 97% of expected 22.49 GB <<
├────────────────────────────────┼───────────────────┼─────────────────┼────────────────────────┼─────────────────────────────────────┤ +
│ TOTAL                          │                   │       $  90.47  │                        │  $ 32.9937   $  0.0000   $ 32.9937  │
└────────────────────────────────┴───────────────────┴─────────────────┴────────────────────────┴─────────────────────────────────────┘

BrightSilence · November 18, 2022, 11:49am

Still doesn’t look too bad. My biggest node looks very similar.

November 2022 (Version: 12.1.0)                                 [snapshot: 2022-11-18 11:48:14Z]
                        TYPE            PRICE                   DISK    BANDWIDTH        PAYOUT
Upload                  Ingress         -not paid-                      482.63 GB
Upload Repair           Ingress         -not paid-                       68.41 GB
Download                Egress          $ 20.00 / TB                    446.43 GB       $  8.93
Download Repair         Egress          $ 10.00 / TB                    264.24 GB       $  2.64
Download Audit          Egress          $ 10.00 / TB                      4.55 MB       $  0.00
Disk Current            Storage         -not paid-       20.32 TB
Disk Average So Far     Storage         -not paid-       19.24 TB  >> 96% of expected 20.10 TB <<
Disk Usage Month        Storage         $  1.50 / TBm    11.22 TBm                      $ 16.83
________________________________________________________________________________________________+
Total                                                    11.22 TBm        1.26 TB       $ 28.40
Estimated total by end of month                          19.24 TBm        2.16 TB       $ 48.71

Payout and held amount by satellite:
┌────────────────────────────────┬───────────────────┬─────────────────┬────────────────────────┬─────────────────────────────────────┐
│ SATELLITE                      │      NODE AGE     │   HELD AMOUNT   │        REPUTATION      │          PAYOUT THIS MONTH          │
│                                │ Joined     Month  │ Perc     Total  │    Disq   Susp   Down  │     Earned        Held      Payout  │
├────────────────────────────────┼───────────────────┼─────────────────┼────────────────────────┼─────────────────────────────────────┤
│ ap1.storj.io:7777 (OK)         │                   │                 │                        │                                     │
│                                │ 2019-06-10    42  │   0%  $   0.34  │   0.00%  0.00%  0.58%  │  $  1.6903   $  0.0000   $  1.6903  │
Disk Average So Far:   1.60 TB  >> 96% of expected  1.68 TB <<
├────────────────────────────────┼───────────────────┼─────────────────┼────────────────────────┼─────────────────────────────────────┤
│ eu1.storj.io:7777 (OK)         │                   │                 │                        │                                     │
│                                │ 2019-05-31    43  │   0%  $  20.82  │   0.00%  0.00%  0.55%  │  $  3.0967   $  0.0000   $  3.0967  │
Disk Average So Far:   2.41 TB  >> 88% of expected  2.74 TB <<
├────────────────────────────────┼───────────────────┼─────────────────┼────────────────────────┼─────────────────────────────────────┤
│ europe-north-1.tardigrade.io:7777 (OK)             │                 │                        │                                     │
│                                │ 2020-04-18    32  │   0%  $  23.89  │   0.00%  0.00%  0.59%  │  $  6.4965   $  0.0000   $  6.4965  │
Disk Average So Far:   6.46 TB  >> 99% of expected  6.50 TB <<
├────────────────────────────────┼───────────────────┼─────────────────┼────────────────────────┼─────────────────────────────────────┤
│ saltlake.tardigrade.io:7777 (OK)                   │                 │                        │                                     │
│                                │ 2020-02-11    34  │   0%  $  34.87  │   0.00%  0.00%  0.54%  │  $ 10.6875   $  0.0000   $ 10.6875  │
Disk Average So Far:   7.15 TB  >> 99% of expected  7.19 TB <<
├────────────────────────────────┼───────────────────┼─────────────────┼────────────────────────┼─────────────────────────────────────┤
│ us1.storj.io:7777 (OK)         │                   │                 │                        │                                     │
│                                │ 2019-02-28    46  │   0%  $   1.33  │   0.00%  0.00%  0.55%  │  $  6.4108   $  0.0000   $  6.4108  │
Disk Average So Far:   1.60 TB  >> 81% of expected  1.98 TB <<
├────────────────────────────────┼───────────────────┼─────────────────┼────────────────────────┼─────────────────────────────────────┤
│ us2.storj.io:7777 (OK)         │                   │                 │                        │                                     │
│                                │ 2021-01-07    23  │   0%  $   0.04  │   0.00%  0.00%  0.64%  │  $  0.0203   $  0.0000   $  0.0203  │
Disk Average So Far:  16.28 GB  >> 100% of expected 16.29 GB <<
├────────────────────────────────┼───────────────────┼─────────────────┼────────────────────────┼─────────────────────────────────────┤ +
│ TOTAL                          │                   │       $  81.30  │                        │  $ 28.4021   $  0.0000   $ 28.4021  │
└────────────────────────────────┴───────────────────┴─────────────────┴────────────────────────┴─────────────────────────────────────┘

Though I also see it on eu1. Some others show 99-100%, which makes me at least quite happy with my estimate calculation for expected B*h.

BrightSilence · November 20, 2022, 11:48am

I just noticed this on my test node. This is of course not relevant for payouts, but 21% is the lowest percentage for a satellite I’ve seen anywhere, so it may give a hint to what’s going on.

andrew2.hart · November 20, 2022, 5:32pm

Maybe it is the same

BrightSilence · November 20, 2022, 5:34pm

It’s not. I mentioned it there because I initially thought that too. But this issue appears before the update as well and it is now clear that the issue you linked is just a display issue. Since I pull data directly from the DB’s and do my own calculation, I’m certain that this is a separate issue and since the satellites report this information back and use the same data for payouts, I’m also sure that this one impacts payouts, while the one you linked does not.

zip · November 21, 2022, 5:41pm

My two cents, running multiple nodes, the older ones are the least impacted, while the newer ones are more.
This discrepancy looks to be related only to production satellites, using the @BrightSilence script - thank you for your work BTW.

ap1 82% of expected 453.32 GB; eu1 75% of expected   1.32 TB; us1 69% of expected   1.30 TB
ap1 44% of expected 252.98 GB; eu1 40% of expected   1.00 TB; us1 32% of expected 861.10 GB
ap1 41% of expected  34.62 GB; eu1 59% of expected 223.11 GB; us1 27% of expected 140.88 GB

This month’s $ average for me is ~39% less per TB stored and going down compared to previous month while the increase in disk usage since the beginning of the month was significant. I did no hardware changes and was truly wondering what is happening and this might explain it.
The other thing I did was to disable the startup scan since the version in which it was available, but it looks like turning it on and restarting the node makes no difference after a day or so.
All versions looks to be affected as on some of the nodes I’m still running 1.65.
I would like to hear an explanation and if this data is not accounted for I would suggest deleting it as I’m sure I’m not the only one running out of space and to earn enough to expand might unfortunately take a while at this rate .
Thank you.

SGC · November 21, 2022, 7:08pm

i noticed my estimated payout dropped by 25% recently (around the time of the new update), couldn’t really see why it happened, but it did… and had me wondering what was going on.

and yeah thanks for keeping a keen eye on this kind of stuff @BrightSilence

another note on this… i’ve often seen the same payouts for like a quarter, before it changes again… which is sort of weird… happened a few times actually and even with drastic increases in stored data…

not saying, its something malicious or intentional… but it did make me ponder why that would happen… maybe this sort of thing has been happening more than we are aware of…
not that i know or have the ability to track it.

think i had a period of 5 months where i got the near exact same dollar amount payout.
without nodes being space restricted.

but at the time i chalked it up to just random chance.
maybe that wasn’t so

Alexey · November 22, 2022, 8:51am

BrightSilence · November 22, 2022, 7:08pm

@littleskunk any update on this? There’s no rush, just checking in.

In the mean time, I have a sneaking suspicion I might have an update on this myself. Since yesterday I started seeing large garbage collection runs on my nodes and trash building up to 10 times the normal sizes I see. Which reminded me that I hadn’t seen garbage collection run in quite a while. I’m gonna take a shot in the dark and guess that garbage collection had been disabled for a while. I vaguely remember a mention of having to restore some data after a faulty garbage collection run a while ago. Perhaps it was disabled until that could be looked into in depth (probably smart). Alright, I’ll stop speculating now.

Ps. GC runs are still ongoing on my nodes, so more trash may be coming.

Jacob · November 22, 2022, 8:17pm

@BrightSilence You are right the team had GC disabled while we were investigating a race condition between garbage collection and server side copy. We’ve resolved the issue and GC is enabled again on all satellites. Let us know if that resolves what everyone is seeing.

We’ve had feedback on these graphs in particular on the node dashboard for awhile now so we are considering rolling out some updates to make it clearer. Open to more feedback on what numbers everyone would like to see represented and what units they should be expressed in.

BrightSilence · November 22, 2022, 8:30pm

Thanks for the response @Jacob. With GC still running and it looks like storage usage stats including trash, it might be a while (at least a week) until these stats reflect correct numbers. Glad that the issues are resolved.

With respect to the numbers shown in the storage usage graph, I like the TBd on the Y axis, but I suggest showing the average used so far at the top of the graph (TBh / hours passed in the month).

Vadim · November 22, 2022, 9:08pm

@Jacob lookslike payouts are somthing like OK, but in API looks like changed mesuremends from TBH to TBD so to get TB*M we previusly took all together and devided to 720000000000000h in month, but now to have something understandable need to devide to 30000000000000

BrightSilence · November 22, 2022, 11:03pm

Ooh boy, US1 garbage collection seems to have simultaneously hit all of my nodes at the same time… Nice to see my nodes are holding up during this IO hammering. I’ve never seen my logtail scroll this fast!

In fact it seems my tail can’t keep up printing them to tmux some times. This screenshot contains less than half a second of log lines, haha.

This is a custom colored and condensed logtail of all my nodes