Question to operators of huge nodes: Should I buy new hardware?

GhostTyper · July 20, 2020, 6:20am

Disclaimer: Please refrain from telling me that I generally shouldn’t buy new hardware for storj. Please also spare me with your moral values. Thank you. <3

I’m thinking of upgrading the total storage available for nodes at one location to 250 or 500 TB netto in total. I’ll currently have 6 productive nodes at that location and 2 in warmup process. I can use 4 different ip ranges at that location.

The metrics of the “more realistic earnings calculator” spreadsheet (where the stated 3 tb/month exactly matches the (simulated) ingress seen on my nodes per ip range) made me realize that there is a (mathematical) barrier limiting the storage space a system of storage nodes under one ip range can fill up. Even if I have unlimited network speed I will be limited to 60 TB per ip range, if the other values of the “more realistic earnings calculator” spreadsheet are correct. This cap is mainly due to the deletion rate of files of a node.

On the forums I saw ppl presenting their near petabyte rigs or other ppl telling that they run 20 (or whatever) nodes at home with way more space than 60 tb in total capacity.

So my question is mainly for such ppl: Do your nodes fill up? If yes: How? Are the values of the “more realistic earnings calculator” spreadsheet not correct on the deletion rate? A set of very descriptive dashboard statistics where I can “see” your answer is also welcome.

Because: If your nodes don’t fill up, I will just buy 4x60 TB (≈250 TB) storage for my nodes on that location. However, if you tell me now that your nodes easily fill up or if your answers suggest me, that it’s doable to fill up more with 4 ip ranges then I may buy the double space.

nerdatwork · July 20, 2020, 6:47am

I just this disclaimer It proves you are not new and have read other posts.

Also I would like to invite @vadim who has 22 nodes on his setup.

Secondly you can go through this thread and see what others have done and contact them through DM.

Good luck with your setup

andrew2.hart · July 20, 2020, 6:59am

I agree that to make storj worthwhile as a living, you need petabyte class amounts of storage.
The problem is that even when things are going fast I can only get ~1TB per week, so I am taking a long time to get where the profits cover the power use of the larger nodes.

I have ~30TB storage but after 8 months I only have ~5TB in use.

Sure you can make some great pocket money with less storage

Vadim · July 20, 2020, 7:23am

Thats why i made all my nodes step by step, and it is one advantege if way when you have lot of nodes, you can setup them as you have hardware, no need make all in the beginning. Also performance is better than raid. Power consumption grow with grow of massive, no need big consumption at begining.

GhostTyper · July 20, 2020, 7:37am

Due to the use of 4 ip ranges I get around 12 TB ingress per month at that location as long as storj continues to simulate user traffic. However, I know that those nodes may take more than a year to fill up.

Well, this may fill up other nodes quicker but won’t solve the problem in the end, because other nodes will loose data.

Do you have an estimate of how much data in reality will be deleted during test-phase right now?

How much space do you have per ip range and in which time in total did it fill up?

GhostTyper · July 20, 2020, 7:45am

Hey @pony, I found your post here: Post pictures of your storagenode rig(s)

Can you maybe answer my question?

Alexey · July 20, 2020, 7:49am

As far as I understand he rented the space in the rack. Not the whole rack is belongs to him

GhostTyper · July 20, 2020, 8:25am

Oh, I just thought he has 80 nodes because of this:

andrew2.hart · July 20, 2020, 8:33am

I think that means 20 units with 4 nodes in each, to make 80 per rack.

BrightSilence · July 20, 2020, 9:53am

I don’t know how much new information I can provide for you as most of my learnings have been integrated in the more realistic earnings estimator you are already referring to. I can tell you that the delete % used there is based on average amount of deletes per month since the last network wipe. My node has never been full, so it should represent any node that has never been full. However, the delete percentage differs a lot per month. It’s even less stable than ingress/egress. So you may go over the 60TB max and go way below it in the next few months. This theoretical limit should not be considered a hard limit, but rather a rough indication of some level of storage your node may eventually fluctuate around.

Since non of these numbers are a guarantee, I would always advise to build larger setups in such a way that you can upgrade over time. That’s what I do as well. HDD prices drop over time, so why buy HDD’s you’re not going to fill up for a year or more (unless you can get significant bulk discounts). If you’re running it solely for Storj, you may want to consider running separate nodes on each HDD. It’s the most efficient use of space and on average will give you the best income per TB. Another upside is that you can very easily scale. Just add a new HDD and a new node when the existing ones are starting to fill up. (Don’t wait too long, you want the new node to be vetted before the old ones fill up.)

From what I can tell now, 250TB will be reasonable eventually if you have 4 IP’s at your disposal. But keep in mind that it will take a while to fill up, because the more data you have, the higher the impact of deletes will slow down the net growth. So it’s not going to be 2 years to get that amount. You can use the more realistic earnings estimator to get a better indication for that.

If you have any more questions regarding that estimator, let me know. I’d be happy to answer them.

Edit: Btw, 3TB ingress is a relatively high estimate that seems to be valid recently. But I wouldn’t count on that for now. 2TB is probably a more reliable estimate long term. That’s what I have set by default in the estimator right now. I’ll adjust this if the ingress stays as high as it has been recently.

hoarder · July 20, 2020, 10:41am

The only immediate effect you’re going to see after buying double the drives, is the bill twice as high. There’s no way to tell when and if you will see the initial 240TB occupied and paid for.

Test data ingress may disappear tomorrow, it’s unreasonable to expect it to stay forever. Then at some point storj will start deleting test data to make space for actual customer data. Net storage utilization rate can easily turn negative at this point.

So, since we’re talking about tens of drives, not gradually expanding storage makes no sense.

TheMightyGreek · July 20, 2020, 11:51am

I agree that gradually increasing the capacity seems to make more sense (at least that’s how I would do it).
That’s obviously except if you can get a good bulk discount, then it would make sense to buy a few hard drives even if they don’t fill up straight away. And why not keep them in storage until you need extra space so that they don’t wear out.

hoarder · July 20, 2020, 12:11pm

I might be wrong, but I don’t think 25-50 drive purchase warrants a substantial discount. At best you’ll get amazon sale price or something like that.

KernelPanick · July 20, 2020, 12:36pm

if you had two nodes, would it not vette twice as fast having the other node full?

BrightSilence · July 20, 2020, 12:38pm

No, vetted nodes share traffic and unvetted nodes share traffic. But an unvetted node is not impacted by other vetted nodes in the same subnet. (other than the occassional deduplication if both an unvetted and vetted node in the same subnet are selected for the same segment. Which is negligible.)

Pentium100 · July 20, 2020, 2:07pm

I have one node on one ip. This is the used-space graph for it (yes, some data is missing, there also was a period of a few days when the node thought it was full and I could not expand it).

champmine18 · July 20, 2020, 3:35pm

Try running multiple instances of storagenode on sich huge spaces, and if you can on multiple ip/24 nets
@GhostTyper

Pentium100 · July 20, 2020, 7:11pm

That’s the problem - I don’t have separate /24 subnets - not yet anyway.

IOwnCalculus · July 21, 2020, 9:49pm

Completely agreed. On purely a cost per disk basis, shucked consumer drives are still cheaper than any bulk order I’ve ever seen.

At a purely cost of raw disk perspective, the cheapest way is to wait as long as possible and buy as little as possible, for rational definitions of “possible”. For whatever architecture you’re building, decide what your minimum increment of new drives is (can you add a single drive, do you need to add in groups of 2 or 3 or 4, etc) and when your array is full, buy that number of whatever drive has the lowest $/GB at that time. Repeat until you run out of drive bays and re-evaluate based on whether it makes more sense to replace the now older / smaller drives, or add more bays, or build a new node.

Storj is a unique use case in that when it’s full, it doesn’t represent a fundamental problem. It will still work, it will still make money, you will still get some uploads as data gets deleted. There is no benefit to being “ahead” of your actual storj usage.