12 Months of Storj, a Newbies findings

Hi All,

Apologies for the post, but my 12 month period of providing a node to Storj has passed, and I wanted to provide some information for people likes me who want to make informed choices.

So, in short please see my payment cycles for my node below, along with high level traffic distribution and hard disk usage.

Note : May21 - Node vetted on all satellite’s apart from EU North

INGRESS - Traffic into Node + Table key
Mth = Abbreviated month and year
S3 = Percentage of (UP GB) traffic originating from a Storj hosted S3 gateway
IN GB = Total GB of traffic received during the month
UP GB = Uploaded segment total size as part of (IN GB)
UP CNL = Uploaded segments total size cancelled - EOF / LTD etc
REP = Uploaded repair segments as part of (IN GB)
REP CNL = Uploaded repair cancelled
PHY GB = Actual storage used on disk by the node
$ GB/MTH = Amount of storage used for payment calculation

|  Mth  | S3   | IN GB | UP GB  | UP CNL | REP    | REP CNL | PHY GB | $ GB/MTH |
|-------|------|-------|--------|--------|--------|---------|--------|----------|
| Nov20 | Null | 50    | 46.64  | 0.06   | 3.30   | 0.00    | 37     | 3        |
| Dec20 | Null | 70    | 64.30  | 0.06   | 5.64   | 0.00    | 84     | 60       |
| Jan21 | Null | 165   | 151.82 | 0.23   | 12.94  | 0.00    | 218    | 150      |
| Feb21 | Null | 299   | 270.36 | 1.55   | 27.00  | 0.00    | 443    | 270      |
| Mar21 | Null | 292   | 262.13 | 0.53   | 29.35  | 0.00    | 602    | 470      |
| Apr21 | 20%  | 149   | 131.94 | 0.36   | 16.70  | 0.00    | 697    | 577      |
| May21 | 47%  | 547   | 345.13 | 3.85   | 197.86 | 0.00    | 1106   | 840      | +Vetted
| Jun21 | 54%  | 378   | 252.32 | 0.56   | 125.09 | 0.00    | 1393   | 1090     |
| Jul21 | 60%  | 349   | 233.50 | 0.27   | 115.16 | 0.00    | 1483   | 1200     |
| Aug21 | 66%  | 418   | 336.71 | 0.52   | 80.74  | 0.00    | 1594   | 1220     |
| Sep21 | 71%  | 361   | 339.44 | 1.09   | 20.46  | 0.00    | 1679   | 1340     |
| Oct21 | 79%  | 337   | 304.91 | 0.62   | 31.42  | 0.00    | 1749   | 1440     |
| Nov21 | 83%  | 412   | 357.51 | 4.26   | 50.18  | 0.00    | 1868   | 1420     |
| Dec21 | 89%  | 204   | 183.91 | 2.53   | 17.49  | 0.00    | 1810   | 1490     |

#END INGRESS

EGRESS - Traffic out of Node + Payment + Table key
Mth = Abbreviated month and year
OUT GB = Total GB of traffic sent from node during the month
DOWN GB = Downloaded segment total size as part of (OUT GB)
DOWN CNL = Downloaded segments total size cancelled - EOF / LTD etc
REP = Downloaded repair segments as part of (OUT GB)
REP CNL = Downloaded repair cancelled
HELD $ = $ value held by Storj that month, to be 50% paid month 16, rest held until GE.
PAID $ = $ value paid to SNO wallet that month on L2, or held on L1 for minimum threshold

|  Mth  | OUT GB | DOWN GB | DOWN CNL | REP   | REP CNL | HELD $ | PAID $ |
|-------|--------|---------|----------|-------|---------|--------|--------|
| Nov20 | 0.28   | 0.27    | 0.00     | 0.01  | 0.00    | -0.01  | 0.02   |
| Dec20 | 8      | 7.68    | 0.12     | 0.20  | 0.00    | 0.16   | 0.05   |
| Jan21 | 19     | 18.10   | 0.25     | 0.65  | 0.00    | 0.43   | 0.14   |
| Feb21 | 33     | 31.47   | 0.54     | 0.99  | 0.00    | 0.51   | 0.5    |
| Mar21 | 27     | 25.47   | 0.45     | 1.08  | 0.00    | 0.59   | 0.57   |
| Apr21 | 36     | 33.47   | 0.64     | 1.89  | 0.00    | 0.78   | 0.78   |
| May21 | 112    | 101.85  | 1.22     | 8.92  | 0.00    | 0.84   | 2.51   |
| Jun21 | 171    | 124.93  | 2.09     | 43.97 | 0.00    | 1.19   | 3.56   |
| Jul21 | 147    | 64.53   | 4.74     | 77.72 | 0.00    | 1.04   | 3.11   |
| Aug21 | 170    | 95.95   | 4.94     | 69.04 | 0.00    | 0      | 4.60   |
| Sep21 | 151    | 101.70  | 4.07     | 45.17 | 0.00    | 0      | 4.39   |
| Oct21 | 118    | 75.94   | 5.54     | 36.51 | 0.00    | 0      | 4.08   |
| Nov21 | 92     | 56.40   | 4.66     | 30.94 | 0.00    | 0      | 3.60   |
| Dec21 | 110    | 60.78   | 13.20    | 36.03 | 0.00    | 0      | 3.98   |

#END EGRESS

A bit more info on my node

so it runs in a docker container, and never really uses more than 512Mb of memory…at some points it has been as low as 200mb ! The node file system is XFS.

My internet link is variable, but I can achieve a good <20ms latency to the Americas.

The backing storage (2.2tb) is on my development Ceph cluster, which abuses 19 Rpi4 8GB in custom 3d printed enclosures - Each brick has 1 SSD for Write log and database ~512GB to 1TB and one or two USB3.1 3.5" consumer grade hard drives - for those interested, I nearly post on other thread about power usage, but I worked it out and I run at about 3.2W per TB stored :wink:

Also, I have been really critical in past of the realistic earnings estimator as for me in the early days it was not accurate - however, I was wrong, it does give to the best ability a realistic estimation on earnings.

:heart: CP

#edit - added more detail to table, ingress bandwidth and total disk usage at end of month, plus vetted month…EU north took months…

#edit - broken down table to INGRESS and EGRESS

10 Likes

Seems to be less paid last months. Looks strange. My node is in month 7 and has a payout this month of almost 5$ (gross total 7$), having 1.2 TB in use. Hmm

@CutieePie thanks for sharing.

Interesting that my node (20 months old) added the same amount of GB stored (~1450) as your node, just between Jun 21 to Nov 21. Do you know how long vetting took?

I’m getting a little discouraged. I’m at month 10 and only at 750 GB. I’ve hade 99% uptime, US Based, solid Internet connection and 16 TB of storage. But it just grows at a snails pace. I’m not expecting exponential growth but I’d like to get closer to the 1.2-1.5 TB’s i see others are getting at this timeframe.

Things have been painfully slow at the moment, let’s just hope traffic picks up next year. Maybe storj will increase testing traffic but I’d rather have real customer data.

Have you checked that you don’t have a second node on your subnet ?
http://storjnet.info/neighbors

2 Likes

I had no idea about this and sure enough, there are 2 nodes found on my /24 public subnet. Of all the public /24’s in the world, I have to share it with someone else lol.

Thank you for this info friend.

2 Likes

No problem mate, unfortunately there’s no easy solution that I know of to fix this.

there actually doesn’t seem to be that many subnets actually used by the public… the majority might be held by larger corporations like microsoft azure… which have a huge portion of the ip addresses.

it seems 20-60% of all subnets are used for storj storagenodes these days… atleast seems that way, and sometimes there are subnets with like 5-12 nodes on them… if a node is started on one of those it would take like 2½ to 6 years just to get it vetted, something which usually will happen around the 6 month mark usually.

as seen with CP’s node

While it’s true that the IPv4 space is smaller than you might, think, your percentages seem to be based on nothing but a feeling.

There are 16777216 /24 ranges, but you’re correct that there is a lot reserved for corporate use. In addition to that there are reserved ranges for local use or other purposes.
Lets for now assume that only 10% is available for residential IPs. That give us 1677722. Lets also assume that every node is in a separate subnet (definitely not true). With 13000 nodes that is at most 13000 / 1677722 = 0.7%. Hell lets say only 1% is used by residential ISPs. You still only get 7%.

Another way to think of it is every /24 subnet has 254 usable IP addresses. How likely is it that more than 1 out of 254 residential IP addresses is running Storj nodes? We can try to think of it as how many people in the world run storage nodes… lets assume again that’s 13000… of 7.9 billion people… but ok many of those won’t have the chance to run a node and quite a few live in areas where it isn’t common. So lets just focus on US and Europe. Thats about 1.1 billion people. Take out underage, elderly and some extra margin to boot. lets say there are 500 million potential people who could run a storage node on their connection. About 200 million households? Low enough for ya? Of those households 0.0065% run storagenodes. Assuming all storagenodes run in those households. Now assuming every node runs in a separate subnet again, you would have 0.0065% * 254 = 1.65% of subnets running storagenodes.

Both methods are inexact, but I gave you the benifit of the doubt a LOT and you’re still orders of magnitude off. But yes, it can happen.

Ps. please don’t quote these numbers as the calculation is inaccurate and intentionally skewed towards assuming there are a lot of collisions to show even then the mentioned numbers still don’t make sense. The real numbers are likely orders of magnitude lower than what I calculated here.

Now someone please explain to me why I keep spending way too much time debunking stats that others put no time into throwing it out there? It’s a problem…

5 Likes

In my opinion it’s just because most of the people who run storage nodes have similar situations (geographic location, ISP with the best speed etc…).
I think that’s probably why we hear more and more people ending in the same subnet as others.

you clearly have no idea what you are talking about, i can add that my numbers only apply to the RIPE segment, i have no knowledge of if its easier to find empty subnets outside Europe.
it’s very common to find nodes on new ip’s, i duno why… like you say there should be plenty of IP addresses for that not to be the case… at best the odd’s seem to be 10- 30%
but at some cloud providers its damn near impossible these days… like 60%

Thanks friend, very kind.

But as you have provided no sources or information, but opted for ad hominem and unfounded assertions instead, I think the conversation is over.

2 Likes

Yeah, but that still means that more than 1 out of 254 households in such a cluster is running a storagenode. Sure if you’re promoting storagenodes among your neighbors that may happen. So there may be clusters of mouth to mouth spread that are more saturated. That doesn’t really impact your chances of ending up in such a cluster randomly though.

But if someone nearby pointed you to the option of running a Storj node, yeah, chances are higher that you end up using the same subnet.

i provide my experience using 7 different cloud providers inside the RIPE segment along of a couple of different ISP’s, your math is basically just useless numbers, and tho they are in part correct, many of the IP/24 subnets must be allocated for something else… or like the mightygeek suggests maybe it’s simply because of people going to the same places…

like say nobody would use microsoft azure because it’s simply to expensive… maybe many of the other ip ranges is similarly out of reach.
the forum also seems to tell the same tale as it’s quite often people experience their nodes sharing subnets certainly way more than just 2% of node operators that run into that issue.

but if we take the 1.65% you arrived at and then factor in that IP’s are segmented into geolocations, if we look at the RIPE segments, then its split into 5 segments
the number would get a bit closer to my experience if it was in fact 5x
do we know how many nodes there actually are in europe?

Don’t forget that in the earlier days of the internet is was not unknown for large companies to get allocated Class A and Class B networks. That gives them orders of magnitude more subnets to use compared to the situation today. I imagine Microsoft is in this situation as one example.

3 Likes

No, you don’t. You assert your experience and then don’t use it to point out where my calculation is wrong. Show your work, like I did mine. You can pick out any number and show me why that number is wrong. I made it very easy to prove me wrong, so please do without appealing to “just trust me, I have experience”.

I’m very well aware that my rough calculation based on IP ranges has a lot of gaps, which is exactly why I provided an additional way of looking at this that you so far have completely ignored. You can’t get around the fact that you state that there is a 20-60% chance that any IP address you would be using has another node in it. I’ll again give you the benefit of the doubt and use your lowest estimate of 20%. I’ll also assume 13000 subnets in use (which we already know is a LOT lower). So your suggestion is that most of us are in a pool of 13000/20% = 65000 subnets. Out of a possible 16777216. So we node operators all pull from the same 0.4% of the IP range at most (it’s actually a lot less, but again, calculating in your favor). Maybe you can use your extensive knowledge to explain why only 0.4% is available to us node operators.

Then once you’ve done that explain how node operators end up in subnets where the chance of a node operator having the same range is > 20% * 1/254 = 0.08%, which with about 290 million householdswith broadband in US (about 120 million) and Europe (about 170 million) would come down to 232000 node operators. Again, this is assuming 1 node per subnet with nodes… which you are already arguing is much higher on average. At best that is still 20x too high. So my very rough estimate would be at most a 1% chance when finding Storj on your own. If you got into Storj because someone nearby suggested it, these numbers go out the door and chances may get higher.

I’m well aware, which is why I included the calculation for only 1% being available. @Stob’s image helps see the impact of those large blocks given out. But what’s unclear is how much of the IP space for Europe and US is actually used for residential connections. I would find it pretty fair to assume that’s more than 1%… but I’m open to be proven wrong. You know, by sources and logic. Not by being told I know nothing or someone else has experience. We’re all here to learn.

Again… an assertion that isn’t being backed up. How many complaints have you seen about this? How have you determined that the forum users are a good representation of all node operators? How have you corrected for people with problems being much more likely to post than those without? You’re claiming 20-60% of operators have this problem. That’s at least 2600. But that 13000 number is now working against you, so lets assume an average of 5 nodes per operator. So we would expect 520 people with this problem at the least. How many have you seen complaining about this?

Anyway, back on topic and sorry for the distraction @CutieePie . The data you have provided is actually really valuable. I’ve already used it to correct the earnings estimator for a higher than represented impact of vetting and it’s showing some good data on an effect that I’ve seen for a while, but never really had good data to quantify. Which is the effect of ingress data from the same month being deleted right away. This seems to account for about 25% of ingress. I’ve been wanting to split up short term deletes and long term deletes for a while as this makes a big difference on income early on. The downside is that it means predictions will go down, but the upside is that there is probably less of a limitation on maximum growth of a node. It just takes a long time to get there.

So thanks for providing this data. I’ll see if I can find some time to incorporate it in the earnings estimator so that it better reflects node growth.

I do have one question, do your ingress numbers include repair? Pretty certain it does, but I want to double check.

2 Likes

Some of the subnets have been mobile and have moved region.
e.g.:
Returned legacy IPv4 address space – APNIC

I’m just over 8 months in, and very similar numbers. Accumulated 1.2 TB, and got $4.88 for this month.

2 Likes