Revisiting the Question: Feasibility of Large-Scale Storj Nodes

S0ly · November 28, 2023, 10:00pm

OR the network have to be very used by customer or something like that ?

btw I have been rate limited ? like I need to wait 16 hours to reply to you ? ^^

Toyoo · November 28, 2023, 10:04pm

Or take part in the commercial node operator program, which does not have a /24 limit.

You can also run Storj, Chia and similar projects side by side. Chia pays less, but will fill any available space. You can first fill the space with Chia, then—as Storj takes more and more space on your node—remove Chia plots to make space for Storj.

Pentium100 · November 28, 2023, 10:12pm

Maybe the node operator has multiple servers each with some free space, so he runs more nodes.
Another reason is to avoid using RAID. There have been debates on this forum whether to use RAID or not and there is no reason to debate that again, but if you have some hard drives that you intend to use only with Storj, then your options are:

RAID with parity, one big node - this makes the node less likely to fail and there’s only one node to take care of, but it also wastes space for the parity.
RAID0 or equivalent, one big node - this is the worst option, as a single failure will make you lose the node and it will take a long time for your new node to get the amount of data your old node had.
Multiple nodes, each with one drive - this is somewhere in between, as you do not waste space, but if a drive fails, you only lose the data that was on that drive.

Another reason to have multiple nodes would be to make it easier to move the nodes around. If I wanted to move my 22TB node to another server, it would take a long time.

Yet another reason is to violate the TOS and have nodes with different /24 subnet IPs to have more data.

Yes. A segment is spit into 80 pieces and those pieces are uploaded to the nodes, but no more than one piece for a /24 subnet (whether there’s one node there or many). This effectively makes 20 nodes under the same /24 look like one big node to the network, for the purposes of uploads.

heunland · November 28, 2023, 11:47pm

Reminder that bypassing the /24 rule is against our Node Operator Terms and Conditions, doing so is only going to cause customers to potentially lose important files, and if that happens thanks to people who only are interested in short term profits, the reputation of the entire network will be compromised and this would in turn result in customers being less confident in using the Storj network.
This then will result in less data and traffic being distributed to all node operators. I.e. you are shooting yourself in the foot and in the process damaging the entire network. This is not a good way to start participating in this project.

BrightSilence · November 29, 2023, 1:59am

Just some notes from the person who made and maintains the earnings estimator.

I used to frequently report on the theoretical maximum where deletes and ingress balance out, but I stopped doing that (you can still find it using the calculator if you just fill in a large number for storage size). Reason being that at the moment it takes 10 years to get to 55TB last time I checked. So that 88TB number really isn’t all that relevant anymore. Furthermore, data on deletes is hard to collect as nodes don’t keep track of the amount of deleted data and it has to be derived. It’s an estimate that can wildly vary with slightly different network behavior.
22TB is what a node that has been well maintained since the start of the public network would have been able to collect so far. It’s not a maximum, and that number is still growing. Earlier years of network behavior are no longer representative and the earnings estimator is based on more recent network behavior.
If you want to know how long it takes to become profitable or to fill up your storage size while using 6 /24 IP ranges, just multiply the numbers in the estimator by 6.
Do not use servers with high power usage. Storj requires very little system resources. You should aim to optimize power usage, not performance.
As always, network behavior and node fill rate may change. I don’t have a crystal ball.

Alexey · November 29, 2023, 3:37am

The customers uploads data, then some other customers deletes their data (or it’s just expired automatically), for example - backups. At some point the amount of uploaded data and removed data become equal, and usage doesn’t grow anymore.

One node per location - yes. However, it’s only recommended if your hardware will be online anyway with Storj or without.
Keep in mind - the network filesystems are not supported, only iSCSI is working.

There is no limit for the size.

Yes, you are correct. This put customers’ data into danger - if you would shutdown these nodes, segments will be offline. With breaking rule you may store more than a one segment of the same file and this increases a risk to the customer to lost their data. Lost data = lost customers = lost payout = lost reputation = shutdown the network. So it’s shooting yourself in the foot.

there is no software limitation, neither 22TB nor 24TB. You may run the node with PBs of available space. But due to deleting data by the customers, it will stop growing long before.

this is circumvent of the rule. It’s an attempt to make a damage to the network.

no

yes. Try to post the one message instead of four

S0ly · November 29, 2023, 2:36pm

Thanks for Everyone’s answers.

So, I will conclude that forum in short: Large-scale nodes on the Storj network are mostly not feasible due to the way Storj works. The network balances data among nodes, but doesn’t allocate more data just because a node has more storage. This results in there not being enough data to fill large nodes, making investment in big nodes currently unprofitable due to limited data availability and long fill times. However, it’s possible to be part of the commercial node operator program, which doesn’t have a /24 limit. Also, running Storj alongside projects like Chia, which fills any available space, could be a strategy to utilize the space while waiting for Storj to use more of it.

But now the question is : can I be part of storj commercial program ? Im making my business but its still not finished, im just an operator that have the hand on a lot of storage and some advantages will Storj Commercial program accept me ? (just question because I dont even know if it would really be an advantage to not have de /24 limit)

S0ly · November 29, 2023, 2:55pm

I dont know what to say on define enough, I have a 144tb node currently and im just trying to fill this space up, but like I said storj seems to not be able to do that unfortunately

JWvdV · November 29, 2023, 11:17pm

As far as the current ToS is concerned, it’s strictly speaking not against the ToS (Node Operator Terms & Conditions) as it hasn’t seen any updates since we went over it last time (find it in the forum). But, although not being against the ToS, it’s not compatible with the concept of STORJ because multiple nodes on the same location can end up with copies of the same part of a file increasing the likelihood of failure of the concept. However, you can set up your own VPN with free resources from Oracle for instance.

Actually I’m seeing an ingress of 4TB a month divided over three IP-addresses. So this might be compatible with your observation.

Yeah, and starting many nodes after each other with low assigned space. Because as long as they are being vetted, they draw data from another pool and fill up much sooner than after being vetted (100 audits per satellite).
So, the node I started about 20 days ago grew 30GB/day for 10 days and now grows about 10GB/day like the other nodes on the same /24-subet. The latest node I started about a week ago, is still growing 30GB/day.

So, theoretically, at this moment I would be able to get about (3*10*30)+((20+10)*10)=1200GB per /24-subnet in a month (in your case probably even higher, because you’re not having already 3 nodes per subnet) by just starting three nodes in sequence with 10days in between. At this moment I have in total 11.5TB in less than four months using three IP-addresses, of which one was sharing it’s /24-subnet with another SNO.

Mind the fact that growth isn’t linear. So the first 10TB will be filled up sooner, than see second 10TB. Which also will be sooner filled up than the third 10TB. And do on …

STORJ won’t make you rich in a wink of your eye. So it needs a long term vision and commitment to make it profitable. And to enroll in the commercial STORJ node operator program, you need to fulfil some criteria you can read about here: Announcing Storj Commercial Storage Node Operator Program

Alexey · November 30, 2023, 5:33am

It’s against ToS

So bypassing the /24 rule is an interference with the operation of Storage Service.
This has been discussed many times, and even if ToS does not specifically mention VPN, it’s fall under description of a prohibited option.

All pieces are unique, we do not use a replication. But yes, circumventing the /24 rule increases chances to get more than a one piece for the same segment, reducing durability. In case of huge setups, it become even more dangerous, especially if it’s a one big server without a proper High Availability configuration and 24/7 personnel availability and without a proper conditioning and a lot of spare details.

If this is about running of small nodes to bypass vetting - this will be also a violation. But now it will bypass the security checks that verify that these nodes are stable on this hardware, in this location, and configured by this operator.

@S0ly
Please also note that running multiple nodes on the one disk (I do not meant that you suggest to do so) is a violation too, besides the fact, that they will affect each other, since they will compete for the same resource.

JWvdV · November 30, 2023, 6:09am

We had this discussion before, and I wholeheartedly disagree again with you on this part because running a VPN client is not modifying the storage node software.

This very section is only related to adapting the Storage Node Software, defined in section 1.5 as “Storage Node Software” means the Storage Node Software which, when installed on a Device, enables such Device to participate in the Storage Network. which section 2.2 clarifies, by starting (...) The Storage Node Software consists of open source code and is made available to you pursuant to the terms of the open-source license agreement(s) located at https://github.com/storj/storj/blob/master/LICENSE (the “Open Source License(s)”). (...) which obviously only refers to the Storage Node executable, this license pertains to.

So legislatively, is quite BS again. Although, it’s obviously against the spirit of the STORJ concept. But apparently the harm of it isn’t enough so it’s not getting any priority, it’s not enforceable or it’s just laziness resulting in no adaptations of the ToS since this discussion started of in this forum.

But again, it’s clear as crystal that the ToS doesn’t cover the whole topic in article 4.1.3, and I’m actually starting to think you’re quite aware.

It’s not against the ToS.

Sure, since apparently many are using VPN for other reasons than GC-NAT or security (not exposing own IP) but for circumventing the /24-rule. So it has been discussed many times, because for obvious reasons for feels like unfair that more skilled people can quite easily circumvent this rule. Although, they might be also better in running such nodes because of the same skills.

But screaming all over that’s it against the ToS, referring sections about changing STORJ-licensed software doesn’t do this discussion any good.

It’s against the spirit of the concept of STORJ, but don’t look at the operators using VPN for any reason. But look at the writers of the ToS.

Don’t know what you’re exactly meaning by HA, but the whole essence like unskilled / non-professional dedicated people running nodes with cumulative big space and insufficient safeguards also circumventing the /24-rule is very unappealing. I completely agree. But again, those who want to do so and have enough skills will do so. And at this point of time, ToS isn’t saying a word about it, aside from the aforementioned points on whether it’s enforceable and so on.

Although I did the math in a previous post, and the probability of ending up with multiple pieces of the same file at one SNO is still very low. It needed quite extreme figures to pose a real risk. But aside from that, I still think it’s also from perspective of fairness aan unwanted scenario.

I haven’t said so, so please abstain from putting me words in my mouth. The only thing I know is that the TS has 144TB, which is unlikely just one disk. So my point is that it’s probably more profitable to not RAID them or whatever (also not RAID0/5/6), but running one node per disk which is the smallest possible amount.

Alexey · November 30, 2023, 6:52am

yes, but it fall under

You may disagree, but it’s a violation. I firmly believe that anything you do with the knowledge that it may reflect poorly on customers is a violation.
However, I would prefer to have an implementation in the protocol or pricing structure instead of ToS to make the circumvent is not attractive, or even better - did not affect customers.

High Availability, edited my post to do not confuse people.

I’m sorry, i didn’t mean that you said that, just had read similar advice from people, who run multiple small nodes on the one disk to bypass the vetting, before they will be placed on the actual storage, so added this info for the future readers, and likely I misunderstood you. Please don’t take this personally.

Yes, I’m agree with you, especially for RAID0 - with one disk failure the whole node is gone.

JWvdV · November 30, 2023, 7:19am

I firmly believe that anything a SNO does and reflects poorly back on customers, should be a violation within reasonable extents (it cannot be asked to do the storage for free, although it would reflect back quite positively on customers I think). But it has to be backed sufficiently by the ToS, which actually isn’t the case because it isn’t up to date in many protectives. And again, the section you’re citing is from juridical perspective only applicable to modifying software which is exemplified with the enumeration that follows of which you apparently seem to feel quite free to cite something disconnected from the context and declare applicable to current situation (which can’t be).

So again, if there are no applicable rules you can scream whatever you want: but not rules, no infringement.

No again.

I actually think, they do so in order to get the first 9 months as low ingress as possible. In order, when they get full payment they increase the storage size to be as profitable as possible. Because as less earnings as possible is being withheld in that case. So they have some “spare” nodes, for the case one node fails bit the disk is still usable or when they want to expand.

That’s the real reason I think the withheld earnings should be related to the amount of data on a node instead of current time-bound lowering of withheld amounts over time.

So actually, this is an opposite situation from the way I recommended in this case.

Pentium100 · November 30, 2023, 7:39am

I know this is not enforced, but still:
Q: Can I run multiple nodes on the same disk/array?
A: No, it’s a violation of the ToS.

Q: How can I shrink a node to reclaim space for my own uses?
A: You don’t, what you need to do is to run multiple smaller nodes instead of one big one and just GE or delete one of the small nodes when you need space.

The official recommendations say that you should not buy hardware for the node, instead use whatever you already have online 24/7. It would mostly mean RAID, because I would not want to use non-RAID drives for my own files (I did that a long time ago, it was annoying as hell to manage the space).

Alexey · November 30, 2023, 7:47am

Could you please point to the post, where this suggested by Storjlings and they explicitly meant to use a one disk for all of them?
Running multiple nodes on the one disk was always a violation, even in v2.

daki82 · November 30, 2023, 11:11am

Pentium100 · November 30, 2023, 11:29am

There is no other way to free up some space used by the node(s). As the current recommendation is to only use the hardware that would be online 24/7 anyway, this means people starting with a nearly empty file server (buying larger drives than they need right now, to have space in the future) and running node on the empty space. But as the server fills up with files of the user, the node needs to shrink to make room for those files. Right now there is no way to do it other than running multiple nodes and deleting them one at a time when you need the space.

daki82 · November 30, 2023, 11:31am

You can set the availabe space to less than the used space, so it shrinks over time, indicating overused space.

BrightSilence · November 30, 2023, 12:02pm

I’ve quoted that clause before. But in my (non-expert) legal interpretation, the language is ambiguous and could be interpreted as a list of things you’re not allowed to do by modifying the software. Technically speaking when the language of an agreement is ambiguous, courts tend to interpret it in favor of the party who didn’t draft the agreement. This should probably have been 2 separate clauses. Add that to the list of things to change when the new ToS finally comes out.

That said, it’s mostly irrelevant. Ambiguity can be a defense in court, but the ToS gives Storj Labs plenty of backing to ban nodes who break the ToS according to their interpretation of the language and at the scale we’re talking about in this topic, a court case over that would be prohibitively expensive. Furthermore, you can bet that this topic, including Storj representatives pointing out the intended meaning would be in evidence in that case. So good luck arguing you weren’t aware of what it really meant.

You could exit individual satellites on a single node as well. Though it’s not a very granular approach. Far from ideal, but possible.

Deletes really dry up after a few weeks and it takes forever to reduce the size even 1% beyond that. This is not a workable approach unfortunately.

S0ly · November 30, 2023, 2:49pm

Wow I seen that my post made some fire ^^*

thanks for everyone response

like I said no im not gonna use a VPN, and if I use one its not to bypass anything just for my own protection so it would likely be on all Node if I have multiple, but I will mostly not have multiple node

and yes I have multiple disk 8 used and 2 for parity

and I will not have 2 nodes on 1 disk because… I will not have 2 node ? xD I have no advantage of having 2 nodes anyway

I just have some advantages in the storage and some big machine of 144tb and my goal is to fill them, like I said I will not break ToS I have no interest in fighting to fill my node xD if storj dont want all my storage its fine I will fill it whit something else