Best Practices for multiple hard disks

cdhowie · December 12, 2019, 6:56am

Note that the network stores data redundantly. A lost node doesn’t mean the data is lost, it just means you don’t hold it and therefore don’t profit from it anymore.

To put it another way, you will almost certainly make more revenue running a node on each disk rather than building a RAID1. With a RAID1 your node could last forever, but you are nearly halving your revenue potential. (If you have many disks, RAID5 could make sense just because it will save you time having to set up a new node each time a disk fails. In effect, you are “spending” the potential revenue of the redundant disk as an insurance policy against your time.)

This feature was in the design doc but it does not exist yet. The “total graceful exit” feature is in active development right now, so there will soon be a way to shut down an entire node without losing escrow, but it may be some time before you can ask storagenode to remove some amount of data without any penalty. Right now it’s an all-or-nothing deal.

This is another point in favor of running a node per disk. If you need the disk back, just shut down the node and accept the escrow loss (right now) or gracefully exit the node (once that feature is ready).

The amount of free space on the node isn’t really relevant; however, running two nodes in the same /24 subnet will cause Storj to equally share data among them. This means each node will probably take longer to be vetted. However, once the nodes are vetted, the “N nodes with M storage each” scenario and “1 node with N*M storage” will see pretty much the same traffic patterns.

To put it another way, if you are running 3 nodes, each node gets 1/3 the traffic that a single node would. You’d still get the same amount of traffic running a single node. (And, running a single node with redundancy like RAID1 would see reduced revenue potential due to the decrease in total capacity.)

So, aside from the additional time to be vetted, running multiple nodes will not affect your revenue.

This is why it’s recommended to turn up one node at a time until it is nearly full (90% or so) and then start up the second node. If a node is full, it doesn’t get ingress traffic anymore. Therefore, this minimizes the amount of time that the nodes fight for ingress data during the vetting period – once the first node fills up, the second node is not losing ingress traffic to the first node, helping it to be vetted faster.

Yes. Docker can run multiple containers from the storagenode image, with different data directories and mapping the container’s internal port to a different host port. You’d effectively be running multiple services on different ports. (I currently do this, it’s not just theoretical.)