Multiple storage nodes

donpdonp · September 19, 2019, 1:48am

As the proud owner of a storj auth token and having some experience now of running a storagenode, whats the story on running multiple storagenodes? Is this possible? Is it simply a matter of launching storagenode on a second box using the same identity key?

If there are multiple storagenodes using the same identity key, how does the reputation/score work? It seems like each node would generate an ID and the score would be based per node ID, but it’d be good to have some confirmation. thanks!

deathlessdd · September 19, 2019, 3:19am

You can only use one storagenode per Identity. So you would need another invite to do another node.

ryan · September 19, 2019, 10:54am

Get another email address and then apply for another identity. That’s what I’ve done.

BrightSilence · September 19, 2019, 2:26pm

You can run multiple nodes on different machines or even different HDDs on the same machine.

However, you should keep in mind that data is spread out across nodes on the same IP range. So that 2 nodes will get as much traffic as one node would. Therefor it is really only useful if you want to run nodes on multiple separate physical disks. Otherwise you’re just creating more competing resources and node reputation could even suffer.
As for reputation, that is still handled on a per node basis. And the same goes for vetting the node. So the second node will have to be vetted again first. This vetting process will take longer because traffic is split across 2 nodes now.

Because of this my advise is always to start with one node and only spin up the second node if the first one starts to fill up. That way you get the most out of your first node without hampering it by splitting up the data. The second node would come in handy when you’re running out of storage space on the first to allow for further expansion of the storage capacity.

donpdonp · September 19, 2019, 3:35pm

Okay I assume by using a new storagenode --config-dir and the same identity key, that node picks a new unique id for itself.

Can you say more about what “nodes in the same IP range” means? If I run two storagenodes behind the same NAT, will they be seen as seperate to the satellites? If I run two storage nodes on different IPs, do the satellites consider them related (perhaps groups by the common identity key?), and would they be treated differently because of it?

BrightSilence · September 19, 2019, 3:50pm

You can’t use the same token or identity for multiple nodes. You need to go back on the wait list with a different email address.

Nodes are seen as separate entities by the satellites, but if they are on the same /24 IP range the traffic will be spread among them. So if one node would have gotten 100GB, two nodes would each get 50GB.

Edit: a /24 range is any IP with the same first 3 numbers. So 123.123.123.x.

Nastea · September 19, 2019, 9:47pm

It is very possible…

I have four and working on bringing up a fifth. However, mine are each at different physical locations. All have different IP addresses and different identities.

cdhowie · September 19, 2019, 10:13pm

Instead of running multiple nodes using multiple disks at a single location, consider running one node using multiple disks (by using e.g. RAID1/6/10). RAID10 in particular with 4+ disks will give you an I/O performance advantage as the data is striped over each mirror.

Running multiple nodes on the same network should only be done if you don’t understand how to set up RAID.

donald.m.motsinger · September 19, 2019, 10:44pm

But with raid 10 you’ll lose half your storage space

cdhowie · September 19, 2019, 10:51pm

HDD storage is insanely cheap. Consider what happens without redundancy… let’s say you have a 2TB disk. When that disk inevitably fails, your node is kicked out of the network. You buy a new 2TB disk and start over. Now you have to wait about a month for a satellite to vet your node, and you’re not getting paid as much storage and (more importantly) egress revenue until your disk starts to fill back up, which could easily take 6+ months.

If you’re willing to start over at zero each time, then sure. In the long run, I think that net revenue will be higher using RAID10 than using multiple nodes with raw disks; your revenue is not going to drop down every time a disk fails.

If you have no intent to replace failed disks (they are truly just spare disks you aren’t using) then the multi-node setup could be useful.

BrightSilence · September 20, 2019, 12:24am

redundancy makes little sense. Instead of wasting half your disks, you could run nodes on all of them and use all available space. You have the potential to make twice as much money and if one disk fails the worst that happens is fall back to your single disk potential. With anual disk failure rates at 2% for modern HDDs, there really is no reason to over obsess on redundancy when there is redundancy built into the network already.

cdhowie · September 20, 2019, 1:13am

Ok, I’ll grant you that explanation makes sense.

Now, how do we allocate monthly traffic capacity to each node? If we’re running 12 nodes, just divide by 12? What happens when one of the 12 nodes has used all of its monthly traffic allocation, but the others haven’t? How can we pool traffic among nodes? This is important considering that egress traffic is by far the most lucrative metric.

With one node, it should be simple to approach capacity each month with enough storage. With multiple nodes, a “hot” node for egress may go underutilized because the traffic capacity partitioning is strict. The only way to deal with this currently is either (1) overcommit traffic (lie to the network) or (2) have a daemon that monitors the remaining capacity and periodically reallocates it among all of the nodes (stopping, removing, and recreating each docker container).

We need a way to specify a monthly traffic limit for a whole pool of nodes.

JohnSmith · September 20, 2019, 8:30am

I want to use all my disks as you describe. I have an older data center class (X8DTT) server. I have an extra token I have been sitting on until my current node (HDD1) fills up. HDD1 is currently at 1.4/2 TB capacity. I’m not an Ubuntu 18.x guy, but that’s what it’s running.

I don’t want to F up my current node which has better than 99% uptime. What are the correct commands to start the node on HDD2?
At what fill % should I move forward with starting a new node on HDD2?
Will there ever be a GUI to easily start new nodes on each HDD? If so, when?
I understand from your replies that there is no ingress advantage to running multiple nodes on a single server (or even multiple servers) using the same IP. HOWEVER, it seems apparent to me there would be an EGRESS advantage. More stored equates to more egress, presuming no ISP bandwidth or data caps. Is this thinking wrong? If so, why?

Thanks in advance to you or anyone else replying.

BrightSilence · September 20, 2019, 9:06am

Let me give it a shot.

You would be creating a new identity and signing it with a new token. Just make sure you use another name like storagenode2 to create and sign the identity. You’ll also be creating a second container, which then also needs a different name, so storagenode2 could be used there as well. You need to use a different port for this node as well, so do the port forwarding and firewall rules for another port (like 28968). In the docker run command you would then change the ports in the address and the first port in the -p parameter. -p 28968:28967 The second port stays the same, that’s the port used inside the container by the storagenode.
You want to give the second node some time to get vetted while the first node can still do full duty. I’d say 75% is probably a good time to add another one, so you’re pretty much at that point.
I know there is work being done for a windows based install and binaries for most OS’s that don’t require docker. I think there will likely be a GUI as well, but not sure there are plans to have that support multiple nodes in one interface. This sounds like a good question for the upcoming town hall. https://zoom.us/webinar/register/WN_i_e4wM3JQheAuWzBw_pIVg
You’re right and you’re wrong. It depends on what your question is. Between 2x2TB or 1x4TB there is no advantage. But 2x2TB would indeed have an advantage in egress over 1x2TB if there is more than 2TB of data stored on the nodes combined. The shorter version, if you store more data, you generally get more egress as well. But that can be achieved by having a larger single node as well as multiple smaller nodes. There is no advantage to doing the second just for income. It should be about equal.

george · December 24, 2019, 11:34pm

Hi,

I’ve missed a bit since migration from rocket-chat here. Just arrived here. I’ve tried to look for a clear answer but couldn’t find one with regards to running multiple nodes on different location but same e-mail/invitation. Last I know was something to arrive on future releases.

Can I run multiple nodes on the same account/e-mail address, is this option available yet or do I need to register for a new invite with a new email address.

thanks.

Vadim · December 25, 2019, 12:13am

You can have lot of nodes and 1 email address.
Just get invetation, generate indentity, sing indentity with invetation code.
Start node.
After 24h you can ask new invitation on same email. and make new node.
I have 11 nodes with same email address.
But each node have to have own indentity with signd own invetation code.

Alexey · December 25, 2019, 4:55am

Hello @george,
Welcome to the forum!
You may take a look:

george · December 25, 2019, 11:19am

All this is confusing:

I already have invite since 3 months or so ago.
Identity is not the same with token, need different token for each node, but can do it on same e-mail? Why do I need another invite on the same e-mail if I already have an invite or token, sorry identity. I have no idea what to do.

Someone please take a minute and make this easy for people to understand?

I’m positive there are tons of people thick like me that would like to join but cannot because this is difficult to understand.

Vadim · December 25, 2019, 11:48am

Every indentity should be unic, and signed with own invetation code. Storj need to control somehow how many storage are maded, thats why it made this way

John.A · December 25, 2019, 12:16pm

@george

Every node needs an uniqe id just like every car needs a uniqe vin.
Its to know who is who.

When your first node has its identity and is started the sattelite know that this node exists.
Next day you can request a new token with the same email and repeat the procedure .
This new node will have a new id.

Hope this made some sence to you.