From what I understand, until a node is vetted on a satellite (100 successful audits) it receives about 5% of the ingress traffic. However, let’s say I have a 100TB hard drive that is finished vetting on all the satellites but is only 500GB full. Would it be worthwhile to start vetting a second node on a much smaller, spare hard drive right away, or wait until the 100TB node is closer to being filled up?
Reason being, I’m not sure if the ingress traffic for non-vetted and vetted nodes is split, so would the non-vetted node take the ingress that would otherwise go to the vetted node? Or would the netted node take most of the ingress traffic leaving even a completely abysmal percentage for the non-vetted node?
The reason I’m asking this is even if the 100TB hard drive is nowhere near full, my intuition says it’s a good idea to start vetting a second node should the first one ever get DQ’ed, at least you have another node that is fully vetted and once you get a replacement for the big hard drive, can start getting filled up much sooner.
TL:DR
Does the same subnet share vetted and non-vetted ingress traffic? Or are those separate and as such, it’d be most affective to always have a node vetting (if you have the spare hard drives) to get the most out of a /24 subnet ingress?
Well first off, a 100TB is way too big, it will never fill up. For the moment, it is not worth going above 20~30TB which already takes years to fill up at these days rate.
Have a look at:
I believe so, but it would be a negligible bandwidth. So it wouldn’t harm vetted node’s revenue much. (Doesn’t seem exact - see @kevink’s answer below)
I think it’s a good idea to have a spare 550GB node on the side so it is vetted and ready when time ti expand storage comes (or to replace a failing node), as long as you don’t put much money on it.
Right, I used a stupidly high amount of storage (100TB) for a single node for the sake of an example. That being said, I just want to clarify, will the vetted node reduce the traffic that the non-vetted node gets? And if so, will it be a massive amount or a negligible amount? Like will the vetting node now take a few weeks longer to get vetted, or several months longer?
Yes, but it’s more of the opposite: your vetting nodes will share 5% of what your vetted nodes receive. (Doesn’t seem exact - see @kevink’s answer below)
Vetting may take longer if you have several vetting nodes at the same time, but it shouldn’t have a massive impact.
From what I understand vetting takes 1~2 months, in all cases.
Your vetted nodes shouldn’t impact your vetting nodes much.
No. According to a comment of littleskunk (developer of storjlabs) all vetted nodes (/24 subnets) are being selected randomly for upload from one pool of nodes and all non-vetted nodes are in a different pool and get selected for ~5% of the global uploads.
So non-vetted nodes don’t actually take the ingress of your vetted nodes in the same subnet.
@kevink: Does this mean an SNO with vetted and non-vetted nodes gets more bandwidth (just a little bit more) than an SNO who only has vetted nodes? Are you sure?
But in any case the difference would be minimal I think, so I believe we can safely say that indeed vetted nodes do not reduce traffic of non-vetted nodes
@Pac
my numbers from recently creating additional nodes on the same subnet supports this.
my two vetted nodes each get a lions share, while the amount taken by the unvetted node is very low.
i cannot say that it’s exactly 5% but it wouldn’t surprise me… it’s certainly a number in that range, so if littleskunk says it’s 5% then it’s most likely right.
the numbers where accurate enough that when compared with others i could see that 1/3 was missing, which turned out was the foreign node on the same subnet… so about 1/3 for each vetted node on the subnet and then a tiny bit for my unvetted node.
Last time I checked the code, actually the opposite was true. For a given chunk of data, non-vetted nodes are selected first, and then vetted nodes. For a single chunk each selected node must be in a different IP block C, so if in a given block C a non-vetted node was selected, no vetted nodes from that block would be selected. So you could say non-vetted nodes “steal” few percent (based on some recent data) of the traffic from vetted nodes within the same block C.
from testing ingress is evenly split on a C block subnet, atleast between vetted nodes… when we tested this our precision was down to like 1% of difference between each C block subnet, no matter the number of nodes, think we mainly had people with 1 - 4 nodes
the ingress was insanely accurate in distribution, we didn’t have any unvetted nodes, but it would be interesting to run some tests on that… and since ingress distribution can be 1% accurate we should be able to fair accurately see the data taken by an unvetted node if its 5%…
when ingress picks up again… i doubt it’s as granularity at this low level of data transmission
i doubt unvetted nodes get additional data allocated to a certain subnet, because that could then be abused, however my testing with unvetted nodes is fairly limited.
best i can say from my own testing is that unvetted nodes get a low amount of data compared to vetted nodes.
so yeah feel free to argue about it, it can be proved by testing, and if there is an advantage to having unvetted nodes, then it should be changed, so really … i don’t think it matters.
also exploiting bugs, while maybe not strictly enforced at present, will not be accepted long term… so do try to not make the list.
Ok wow, way more discussion than I expected for the simple question. But the short here seems, to go ahead and get a second node vetting, no harm in it, and if anything, it’s nice to have a second, fully-vetted backup node ready lol.
Looking at my nodes’ historical storage figures, the last month’s average extrapolated would be between 15 and 20 TB filled per year. 100 TB is not that unreasonable if you expect the drive to last 10 years. (I’d be more worried about making a mistake during maintenance and having to start over. Operating smaller nodes lessens the impact of mistakes.)
Unfortunately, it’s not linear. The more data you store, the more data gets deleted per month, and you’ll reach some kind of equilibrium at some point, making it impossible to reach 100TB, as of today.
Have a look at @BrightSilence’s simulator to have an idea:
there is a barrier tho, % of deletions pr month vs ingress
i think the deletion has historically been about 5% and ingress of 1.5-2TB pr month historically or something like that, which would mean that at 32.5TB to 40TB the deletions equal the ingress
at that point the node is unlikely to grow beyond that… ofc it’s dependent on two factors which can change and seemly in the last few months have changed… so the max node size might be much smaller in the present…
it’s partially why the recommendation is 24TB as max node size… because beyond that doesn’t really make much sense, but who knows how it will look in the future it will very much depend on the use cases.
But if multiple nodes behind the same /24 subnet are treated as one node for the purposes of ingress, that means that the maximum amount of aggregate capacity you can have across all nodes at one location is 24TB? That seems rather low.
yeah exactly right, because the ratio of ingress vs delete doesn’t change… no matter how many nodes…
24TB isn’t locked tho… simply a matter of ingress to deletes and ofc so long as the whole ip/24 ingress distribution is a thing.
but yeah 24TB is a very reasonable number, i’m sure storj spent a good deal of time to come up with that exact number.
now that i think about it, then it would also mean that if one had a 24tb node and then someone else popped up on the subnet, that would over time… i would guestimate less than year reduce the node size to about 12TB
such long-term estimates are moot because nobody can tell how the network looks in more than a year. Additionally the delete ration at the moment is less than 5% so you would get over 24TB.
Storj just need a reaonable number to show people that an investment of more than 24TB will make absolutely no sense. People come here from burst and other hdd-mining coins and want to put 100TB to “good use”, so they need a reasonably low number to understand that their massive storage will have no advantage.