New SNO expectations

o1eal · June 1, 2021, 6:23pm

Hi, just wanted to report back. Your prediction was accurate, and much appreciated. I exceeded what the estimator predicted (7c instead of 5c), in spite of having far lower “Repair” egress (285 MB vs 1500 MB). I accumulated more data than expected (120 GB vs 100 GB). Now I have 7 shiny pennies accumulated, now all I gotta do is wait till I exceed the payout minimum! Quite happy with how my Storj experiment is progressing.

One tiny bit of feedback. I found the spreadsheet more informative when I multiplied columns H and J by 1000, and divided columns I and K by 1000 (effectively, expressing “egress” and “repair” in terms of gigabytes rather than terabytes). With my 4 TB drive, this is a better representation, allows me to see an extra digit in some cases, and matches what I see on the dashboard better. (Of course, if I had a 10 or 20 TB drive, once it started to get full that might not work so well.) I’m not handy enough with spreadsheets to know how to do that to a whole column, but it helped to do it to a few cells.

One other oddity: In the last day or two of May, the “estimated earnings” field dropped below the net total! It was 5c, while the net total was 7. Prior to that, it was always equal or greater than the net total. No biggie, but idly curious if that’s some kind of known bug.

BrightSilence · June 1, 2021, 6:59pm

That’s awesome, thanks for the extensive feedback. I’m actually fairly happy with how close the results are. And even better, I updated the numbers a view days ago and they are actually even closer if you look at the latest version.

Let’s go through some differences.

This has the biggest difference and there is a reason for it. Young nodes don’t really have pieces for segments that need repair yet. I don’t currently correct for that and I’m actually a little surprised to see you saw this amount of repair even. It regularly takes 2 to 3 months before repair starts to take off. Since it never really accounted for much of the income I never corrected for that, but I might just add something to take that into account. Once the node has a little more data, the repair numbers start to fit better.

After corrections I made a few days ago, this prediction has shifted to EXACTLY 0.12TB. So it seems my corrections were spot on, thanks for confirming!

You didn’t mention this, but there is a slight overestimation on average storage over the month. This has to do with the non-linearity of receiving data. I don’t really correct for that, but after the first 2 months that difference kind of disappears anyway. So no need.
There is also an underestimation of egress. This is caused by the fact that recently uploaded data is more likely to be downloaded and in the first few months all data is recently uploaded. This also corrects over time and these 2 deviances seem to compensate for each other anyway, so no harm there.

I get your point, but I prefer to leave it as is for 2 reasons.

All official communication talks about compensation per TB, so these numbers directly relate to how you’re compensated
Displaying GB would add an order of significance to the numbers that suggests an accuracy the predictions really can’t match

It is… in fact, I posted this detailed report on github 3 months ago pointing it out and suggesting a very specific solution. Estimated earnings inaccurate due to use of full days only · Issue #4056 · storj/storj · GitHub

I noticed the issue at the time because it deviated from the prediction I make in my earnings calculator ( Earnings calculator (Update 2023-12-05: v13.1.0 - Now with support for different payouts per satellite - Detailed earnings info and health status of your node, including vetting progress) ) and I wanted to know what I did wrong. Turns out Storj Labs did it wrong. It’s a relatively small deviation at most times, but it can lead to predicting double the income in early days of the month near the end of the UTC day and indeed the effect you were seeing. Give my calculator a try if you’re interested. It should show a more consistent prediction as well as a lot of additional numbers not shown on the dashboard. (Also information around vetting progress, node status, payout history directly compared to your nodes own stats for previous months etc. Check out the topic for more.)

o1eal · June 1, 2021, 7:37pm

And thank you for the detailed reply. Your points all make sense, and feed my growing understanding of how things fit together. Congrats on the accuracy of your updated spreadsheet! I’ll pull an updated version from yours and give it a look.

The reason I didn’t mention average storage, I suppose, is because that’s the statistic I’m still having trouble understanding. I don’t really understand how it’s calculated, or what it means in practical terms. If you have any insight to offer, I’d be curious. I did notice that in the relevant graph, it was highly periodic…while it trended up throughout the month, it would typically be up one day, and down the next. That seemed odd, but without a real understanding of what it reflects, I’m not really sure.

One final point I’m unsure of – is there any visibility into my vetting status? I don’t mind just waiting if this isn’t something made available to the end user…but if the info is hiding somewhere in the dashboard (or elsewhere) I’m curious.

BrightSilence · June 1, 2021, 9:38pm

Storj calculates this based how how many hours you store data. 1 GBh is 1 GB stored for 1 hour. Or 2 GB stored for half an hour. The satellite keeps accurate track of how much data is stored for how long during the month. If you divide that total GBh number for the entire month by the number of hours in that month, you get GBm, which comes down to how much you node has stored on averag over the entire month.

Now the estimator throws all of that out of the window and just looks at how much you had at the start of the month and how much at the end and just assumes the average across the month is halfway between those 2. It’s only an estimation after all and growth is usually fairly linear, in which case this would be very close to reality.

This is because the satellite aggregates the GBh values roughly twice a day. But the periods can fluctuate. So on some days the node receives the data for 20 hours and then the next day for 28 hours for example. The total will always add up correctly though.

I actually already mentioned this in my previous post in passing. But I can add that my earnings calculator as of now is the only way to get this data as it isn’t displayed on the dashboard and was removed from the storagenode API. You can try it here: Earnings calculator (Update 2023-12-05: v13.1.0 - Now with support for different payouts per satellite - Detailed earnings info and health status of your node, including vetting progress)

o1eal · June 1, 2021, 9:53pm

Sorry I’m a bit dense. Can’t really discern what the formula would be to determine GBh.

Sorry if I missed something. I actually tried running your script, but I’m currently trying to figure out what path to use. I thought it was simply /mnt/storj but that does not seem to contain the correct .db file.

BrightSilence · June 1, 2021, 10:04pm

Wrong unit there, it’s not GB/m, it’s GBm. Works just like kWh.

aaand you just edited your response, so I no longer know what I was responding to.

Scratch that, let’s just elaborate on that unit. First of all, the m stands for month, not minute.
And it’s not GB per month, but GB for a month. Just like when you use 1kW for 2 hours you used 2kWh. So if you store 120GB for a month, you would have 120GBm. But you didn’t, because you started the month storing nothing. So taking that into account and based on how long you’ve stored each byte of data, you’ve stored 40.86GBm. It’s a bit of a confusing unit, but usually the comparison with kWh helps.

Since you’re on Linux I assume this is a docker node. You can use the same path you set in your docker run command. Make sure the user has access to the data in that path or alternatively run the command with sudo (which I would normally recommend you not do with random scripts from the internet, but since I built this one I know it’s safe. Up to you whether you trust me ).

o1eal · June 1, 2021, 10:59pm

Sorry, I realized almost immediately how poor my understanding was and that my formula was completely off base, and I thought I could edit it out before you saw it. You’re too quick for me! Sorry for the confusion. I think my earlier formula is best left out of it (as you [widely] [edit wisely] did with (Scratch that…) as a complete misunderstanding.

I think part of it just clicked for me. I thought the GBm figure was something being calculated off the other data that was being reported; but I just realized, it’s actually based on data that’s more granular than I’d see. A 500 MB chunk of data could be stored all month or for 10 minutes, and either way it might or might not be part of the total GB on my drive at the end of the month. So, it’s really “primary data” – I get to learn what each satellite reported, but that’s as far as it gets; I need to take it on faith that the satellites are reporting accurate GBh numbers, not try to replicate them with my own calculations.

That’s all fine if so, I just hadn’t understood, so I kept trying to figure out how to use the raw data at my disposal to calculate the [GBh] [edit: GBm] numbers. Your explanation here makes sense, and clarifying “month” vs “minute” helps immensely.

Yes, I’m using docker. Thanks for the explanation, the path was correct but it’s indeed a permissions issue. I’ve run it now, and I see that my vetting is:
22, 30, 27, 5, 7, and 84% on each of the six nodes. Cool! Nice to have a way to track that, thank you.

o1eal · June 1, 2021, 11:11pm

And, so now it clicks – and please correct me if I’m wrong.

Column F in your spreadsheet is the one that connects to the GBm figure expressed on the dashboard (except it’s in TBm). You estimate it as being half the total amount currently stored, so in this case you estimate 60 GB (because I have 120 GB on my drive). The actual amount is ~40 GB (close enough). I guess this is what I was most wanting…to figure out which column in your spreadsheet connects to which piece of data in the dashboard. That may be the final significant piece I was having trouble placing, but I think I’ve got it now.

o1eal · June 1, 2021, 11:18pm

And if I’m getting it right, the connection between the “Disk Space Used This Month” graph on the dashboard page and the “Payout Information” page is that adding each day’s value would produce a total that is what’s reported on “Payout Information.” Yes?

(I feel like I’m asking a lot of you…but if I can get this stuff right in my head, I think I’ll be able to pars a lot of what I’m reading online a lot better. So I very much appreciate all the feedback you give.)

o1eal · June 2, 2021, 6:03am

OK, last post on this for the day, promise. But something else just clicked. The main dashboard page talks about TBh (Terabyte-hours), while the Payouts page talks about GBm (Gigabyte-months). How odd, to use two different but very similar scales! If I’m not mistaken,

1 TBh = 1.34 GBm (in a 31 day month) or
1 TBh = 1.48 GBm (in a 28 day month)

So, if I have that right, I’m finally able to see the proper relationship between the two pages.

Perhaps the use of the different scales is because the value of a “GBm” varies from one month to the next, since a “month” may have 672, 696, 720, or 744 hours depending on whether it has 28-31 days. So, you can’t be entirely accurate in one context talking about GBm, and you can’t be entirely accurate in the other context talking about TBh. Within any one month, there would be an exact conversion formula; but from one month to the next, that formula would change.

Eesh. Complicated stuff.

BrightSilence · June 2, 2021, 7:44am

Looks like you got most of it. I’ll just respond where I might have something to add, you got the rest of it 100%.

After unit conversion from GBh to GBm, yes that is exactly what it does.

Storj used to only use GBh. I think I was actually the first to use GBm in the earnings calculator. My main reason was that the number would be more meaningful to node operators. If you store about 10TB at the end of the month and that number shows something around 9.7TBm, it makes sense that the difference is because you started the month with less. At the same time 6984TBh probably means nothing to you. I still include that number too, because Storj uses it in some places and it helps to compare, but it represents the exact same thing.

To be honest, in that graph I would personally choose to use GBd for the exact same reason I use GBm for monthly numbers. Those daily numbers would be instantly recognizable because if you store 10TB, the number would float around the 10TBd mark.

As for the length of a month, Storj Labs always uses the average 720 hours. So that means on longer months you actually get paid a little more than on shorter months. So 1TB stored for a 31 day month is actually 1.033TBm. My earnings calculator also takes this into account. The estimator isn’t exact enough for this to matter, so it just assumes all months are 720 hours.

I know right… both the earnings calculator and earnings estimator started as such simple tools, but things got a lot more complicated (and to be fair, more accurate as a result) over time.

o1eal · June 2, 2021, 4:57pm

Many thanks again for the detailed reply.

So, is it accurate to say that 1 TBh = 1.39 GBm ?

If so, is it really useful to have the interface use both? I’m wondering about small tweaks to the interface that could make it much easier to understand the links.

BrightSilence · June 2, 2021, 6:13pm

Depends on your sense of accurate. It’s 1.3888… GBm

In my opinion it is not. The most valuable numbers would be GBm when reporting on storage per month and GBd when reporting on storage per day. That would still be 2 units, but the resulting numbers would show pretty much the same thing most of the time and they would show amounts the node operator would instantly recognize. The only reason my earnings calculator shows GBm and GBh is because… well, it reports on a monthly basis, so GBm makes the most sense. And Storj still uses GBh in some places. When they stop doing that, I’ll stop doing that. And since I don’t report anything on a daily basis, including GBd makes no sense for me. Of course is Storj starts reporting that I might switch from GBh to GBd.