How to add more drives to Storagenode

abc · October 3, 2019, 3:15pm

Hello there,
i´m new in V3.
My Question is very Simple.
At the Momend i have one Node that use a 3TB external DRIVE as Space
Now i have recived from a friend 2x 2TB external Drives. I want to add this Drives to the Node but i cann´t find how to do this.
And also:
Is it even possible to add more than one Drive to one Node or must i set up a new Node with same Identity but other “DATA” - Dir?

web4yougmbhch · October 3, 2019, 3:23pm

You can build an lvm drive with this new 2x2TB discs. Then move all files from your storjdrive to this new lvm and connect this to your storj. After that you can add your 3TB drive to your lvm. But if on drive goes to hell you lose all data.

And no - you cant set up a new node with same identity.

abc · October 3, 2019, 3:25pm

Thank´s and how can i set up a second node? can i ask for a second “Single-Use”-Token or how this goes?

web4yougmbhch · October 3, 2019, 3:26pm

Yes same as the first. But you have to use another mail address. And dont use the same ip.

abc · October 3, 2019, 3:29pm

Ok than it´s no option because i don´t have two or more IP´s from my ISP.
But how is it possible that users say they have 11 Nodes

web4yougmbhch · October 3, 2019, 3:31pm

They have servers in more locations than you.

cdhowie · October 3, 2019, 4:08pm

You can use the same IP with a different external port. The network will just treat them as a single node for the purposes of file uploads; you won’t get two pieces to the same file as this decreases the geographic redundancy of the data.

abc · October 3, 2019, 4:25pm

When i want do this than like this:
docker run -d --restart unless-stopped -p 28967:28968
-p 14002:14003
-e WALLET=“0xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX”
-e EMAIL="user@example.com"
-e ADDRESS=“domain.ddns.net:28968”
-e BANDWIDTH=“xxTB”
-e STORAGE=“xxTB”
–mount type=bind,source="/path/to/second/identity",destination=/app/identity
–mount type=bind,source="/path/to/the/second/drive",destination=/app/config
–name storagenode2 storjlabs/storagenode:beta

And then forward Port 28968

Is that right or how i should do this?

cdhowie · October 3, 2019, 5:11pm

Looks fine to me. Do both nodes still show as online?

anon27637763 · October 3, 2019, 5:42pm

I would definitely NOT Recommend anyone do this.

It will work for expanding your available drive space. However, the reliability of the storage space will suffer greatly.

There is no easy way to offload the data to other storage spaces unless you have a very large drive to transfer everything to later.
If any of the drives fail, All of your data is lost. You can’t recover the partition since the partition is spread over several drives.
When multiple drives with different speeds, caching, and inode sizes are combined, the slowest drive is going to determine your maximum speed… and you may lose data between drives.
If a single file is written partially to one drive and then a second one in your LVM, then there will be long seek times.

So, for reliability and performance, using LVM as the “poor-man’s” RAID is a very bad idea.

I would highly recommend using a RAID architecture. Hardware implementation is better, but software should work fine. I use mdadm for most systems I build.

cdhowie · October 3, 2019, 5:56pm

This is only true when using striped volumes. Linear (non-striped) volumes will simply work at different speeds depending on which PV backs the extents currently being accessed.

Reliability is the bigger concern.

I strongly dispute this point especially for consumer hardware (where most storagenodes will be run), as most RAID controllers in consumer motherboards are “fakeraid” – they don’t provide battery-backup for writes, nor do they provide the many advantages of software RAID (portability and flexibility). Many hardware RAID systems quite literally offer the worst combination of characteristics of other RAID solutions.

This point is a bit vague and depends on what you mean by “storage spaces.” LVM volumes can easily be moved between physical devices even while in use; this is one of the selling points of LVM in the first place.

BrightSilence · October 3, 2019, 8:41pm

those both need to be flipped. External port first then the internal port in the container.

As for LVM, RAID or any other solution to combine disks. You should avoid all of them. You either create a situation where you lose your entire node when one HDD fails or you’re wasting space for redundancy instead of making money on that space.

Say you have 3 HDDs, you could set up a RAID 5 and be protected against drive failure. You lose 1/3rd of the space. But instead, you could also run 3 nodes, one on each disk. And while you’re not protected against drive failure, the failure of one drive will just take down one node, which actually was able to make money on the 1/3rd of space you would have thrown away with the RAID 5 setup from day one.

Alexey · October 3, 2019, 10:48pm

3 posts were merged into an existing topic: The estimator needs work

abc · October 3, 2019, 10:06pm

At the Moment i wait for the second “Single-Use” - Token. If i recive it i post my solution.

anon27637763 · October 3, 2019, 11:05pm

If you think about running a web site, you’ll see that this is not necessarily the case at all.

If I run a high traffic website, I might store about 40 GB worth of data on my server… but I might need 1 TB of bandwidth to serve out that 40 GB of stored data to a Earth sized network.

I’m talking about the IP packet data that is traversing your Internet connection. In the case of multiple nodes running the incoming data is spread across multiple IP packets. Since the networking pipe has a maximum bandwidth, more packets equals more bandwidth use. At a certain point, the available bandwidth is going to be consumed by the datagram structures… rather than the payload data.

This is another issue which would need significant testing to verify. And multiple geographical sparse IPs in order to actually test well…

BrightSilence · October 3, 2019, 11:43pm

You’re not hosting a single high traffic website on your node. You’re hosting a pretty damn random sample of everything customers store on the network. Which will eventually have a pretty consistent % of data downloaded per GB. Your example simply doesn’t apply when you look at the total averages. You’re over complicating things. Think of it this way. If you’re hosting data for 10 customers you would get half as much bandwidth usage than if you host data for 20 customers.

As for the IP packet argument. Explain to me where the additional packets come in? It’s not like individual transfers are split to to nodes. Each nodes handles their own transfers with the exact same amount of packets as one node would use to handle those same transfers.

But lets say that I’m wrong about that packet overhead. Do you really think it will compensate for a 33% loss in income? Even if you have some overhead, we’re talking negligible amounts. I’m not against testing anything, but I’m very confident in just reasoning this one out and dismissing RAID/LVM.

anon27637763 · October 4, 2019, 12:43am

Very likely. I’ve been known to do just that…

Perhaps it’s the purist in me that makes the claim that RAID is the way to go. Though, from experience, I’ve lost tremendous amounts data via lost drives in ad-hoc LVM architectures… So, which ever way people decide to store data, I do not recommend an ad-hoc LVM array.

On IP frame overhead…

If you page through the calculations, you can imagine a situation in which the overhead significantly hinders payload throughput. Would it ever reach 33% ? Probably not… but it could easily reach 15% …

In a worst case scenario, perhaps one could imagine losing bandwidth to IP overhead combined with downtime of single drive nodes along with the extra electricity cost of running multiple nodes adding up to at best a break even with running a single node with 3 drives configured in RAID 5 while losing 33% of possible maximum storage.

BrightSilence · October 4, 2019, 12:57am

Please explain how any of the values discussed on that page would change when switching from a single node to multiple nodes. And I mean give actual examples and not just vague terminology. Because while I’m not an expert on the TCP stack, I’m almost certain that none of that would change at all as what is described on that page is a FIXED overhead depending on MTU size. And please include how you arrived at the 15% number as well. Because I’m not following your reasoning.

anon27637763 · October 4, 2019, 2:41am

Real world testing would be required to figure this out. There are a lot of variables to consider.

I give a 15% extra overhead estimation due to worst case scenario where there are three nodes that receive the same offer of data and all three nodes lose the race to store the data due to increased latency on passing the data over the single IP pipe. In this case, no data would be stored by any of the running nodes and the effective penalty to the available bandwidth at that time would be three extra frames occupying that slice of time.

Right now, using your new earnings calculator… which is awesome BTW… my running node’s earnings are almost entirely composed of Egress payments. The storage component only pays out when the drives start to fill and stay filled for long periods of time. However, that’s also when drives start to wear out.

BrightSilence · October 4, 2019, 7:45am

This situation can never happen since the node selection process is built precisely to prevent this from happening. It would lead to centralizing of data on a single point of failure which puts it at more risk. Hence it is prevented by satellites.

It also has absolutely nothing to do with TCP overhead and you failed to show how you calculated 15%. Additionally an exact calculation of overhead in TCP packets does not require testing, it’s an exact calculation.

Nodes could still get different pieces from different uploads at the same time, but guess what, a single node deals with that in parallel as well and can be overcrowded by that. This is why the max concurrency setting exists to begin with in the first place.

Finally drives don’t wear out when they hold data, they wear out when you read or write to it. Which is directly correlated to bandwidth usage, not storage usage.

It seems you’ve just heard a few things and combine them with each other on a way that seems right to you, but simply doesn’t apply. This just confuses the conversation. If you’re making claims, back them up. Especially if you make numbers.
Last note. Annual failure rates for modern hdds are now around 2%. You still want to protect against that if a failing disk means losing your own data. But the storj network is built to deal with that and the 2% doesn’t weigh up to the loss of income of 1/3. I’ll stop with this conversation now, I don’t think it helps anyone else at this point. But please keep in mind many SNOs will read this and they’re looking for substantiated advise. It’s a disservice to them to post simply based on assumptions. It will only waste their time and lead them in the wrong direction.