Multiple Nodes per HDD to lower impact of node disqualification?

Hello all,
this morning one of my nodes has been disqualified. 8TB, gone! Sad! Running smooth from Aug 2019, earned $408 nice :grinning:

The hard drive itself seems to be fine (8TB Seagate Ironwolf). The file system was ext4. Machine is a Synology 1819+. There are other hard disks working in the machine with 5 other nodes.

Now 8TB are gone. A big impact on my payout.

The topic of multiple nodes behind one IP adresse is very clear to me. There is no benefit to host more nodes for more date input purpose. Check.

But:
Before I set up a new node I would like to know if it makes more sense to run 4x 2TB nodes on this 8TB disk instead of one 8TB node.
Of course, if the hard disk itself fails all 4x2TB=8TB will be gone anyway. But if now this was a file system error, a bit-rot or any volume-based issues, then only a small part (2TB) of the data would have been lost which is not as impacting.

What do you think? 1x8TB or 4x2TB ?

Disadvantage I already see: Longer vetting and because of the node-overhaed possible diskspace loss.

Next question: ext4 or BTRFS (Synology has fixed BTRFS-Issues)?

I see your point. Just one disadvantage for your suggested approach that comes to my mind would be that when you restart your system or the nodes get updated, the filewalker process will run 4 times simultaneously on one drive. This will increase your IOPS during that time a lot. I am not sure if that could hurt your scores…

2 Likes

I don’t know if it is applicable:
https://storj.io/storj-operator-terms/

  • 4.1.4. You will provide and maintain the Storage Node so that, at all times, it will meet the following minimum requirements (“ Minimum Storage Node Requirements ”):
    • 4.1.4.1. Have a minimum of one (1) hard drive and one (1) processor core dedicated to each Storage Node;
2 Likes

It’s not. It’s a relic that still didn’t get removed. At least the processor part. The one node per harddrive part hasn’t been discussed often but makes as little sense as the cpu requirement. It just all comes down to your hardware and how it performs. having 2 nodes on an HDD doesn’t neccessarily make the node perform badly.

personally I use 2 nodes per HDD but not because of some dq impact but because I have multiple HDDs and if one HDD dies, I can copy one of those nodes to the new hdd and don’t have to start with an unvetted node.

2 Likes

To add a bit of color around the guidance that there is one node per hard drive.

The most common way pieces are lost in our system is nodes going offline. By far and away, this is the most common problem, and disk or hardware failure is much less common relative to this.

When a disk or hardware does fail, it’s safe to assume that all of the data on that disk is at risk. Maybe not lost, but certainly now suspect.

The way our auditing system works is it does spot checks. If a spot check fails because the node simply isn’t online, that’s the common case and handled differently than if the spot check fails because the node is online but has a drive error or returns incorrect data. We expect incorrect data to be exceedingly rare (bit flips) and drive errors to also be pretty rare relative to nodes going offline. That’s why audits can so quickly disqualify a node - if a node starts to fail audits with drive errors or bad data but the node is online, we assume the hard drive is going bad and as an overall system we begin the process of repairing the data to other drives.

If you put more than one drive on a node, our system doesn’t have a way of understanding that half the data might still be good. The only thing that can be assumed to be at risk is the node itself.

Increasing your node’s reliability doesn’t make a lot of sense in the expected value sense because the system as a whole already has redundancy built in - it takes more resources to run RAID and it doesn’t actually make the overall system any more reliable. Better to run more nodes than to run fewer but more reliable nodes.

Having more nodes per drive isn’t as bad as having more drives per node, but it does give you additional overhead you could have avoided, and it could reduce the overall throughput of the system (you get less dedicated IOPS per node and more disk seeking/thrashing). I see that it reduces some disqualification risk if you’re able to save half of the data, and that’s fine. Really, the guidance here is to direct people towards a good rule of thumb (one hard drive per node) so that SNOs know what the designers of Storj are targeting and assuming.

Hopefully as we continue to optimize the software, the storage node software won’t be as CPU heavy and we can reduce the CPU and RAM requirements per node some.

6 Likes

personally I use 2 nodes per HDD but not because of some dq impact but because I have multiple HDDs and if one HDD dies, I can copy one of those nodes to the new hdd and don’t have to start with an unvetted node.

I also do have 2 spare-nodes on 2 separat drives. For the exact same reason. They run on minimum capacity (600GB) so copy process is quick in case of reset. I figured you need at maximum 2 spare nodes on 2 separate drives as it also can happen that the drive with the spare node on it dies. But you need not morgen than 2 spare nodes.

As I already menthioned in my initial post I supposed to having overhead as an disadvantage. Thanks for confirming.

But: My 8TB node produced $30/mth. That is a monthly loss now. Although I have other vetted nodes that can take the diskspace over now, it will take a very long time until these 8TB are filled up again with data. The current ingressspeed is poor. It will take at least a year. Makes roughly (!) estimated $360 loss.

Having 2 nodes and only one of those 2 $15-Nodes would have been lost, it would have only halfed my income. Only $180 would have been lost. Doing so, I properly would loose 10% of income due to wasted overhead and another 10% for IOPSs, would have made -$72. Still better then starting from skretch now.

From my experience, having a large number of small nodes on a single hard drive has only one drawback, but so many benefits that I wouldn’t choose otherwise. The drawback is that you’ll better script your way through administration of nodes, because you don’t want to manually do steps on all nodes separately. The benefits though…

  • On restart a node walks over all stored data. This is faster on small nodes.
  • Migration of small nodes between drives has smaller downtime.
  • And if migration is within a single machine, you can dedicate small LVM volumes for each node, then migrate a node with zero downtime using pvmove.
  • If you get a new drive for Storj, you can just migrate one of the already vetted nodes from an old drive and have the drive fill up faster.
  • If you suddenly need space, you can quickly gracefully exit a single node recovering 500GB of disk space, as opposed to sacrificing a large node (because graceful exit of a larger node also takes more time). And even if you don’t have time to graceful exit a small node, sacrificing it is less of an impact. Hopefully Storj implements partial graceful exit soon, so that this use case is no longer valid.
  • In case there is another operator within your IP block, more nodes gets you more traffic, because the traffic is divided equally between nodes, not between operators (unless Storj fixed this problem since I have checked the code).

Also,

smaller nodes do less I/O as well, so it should balance out.

3 Likes

In my humble opinion, especially if your drive is not a professional piece of hardware (or worse, an SMR drive), that is a massive issue.
The time it takes to browse all files at node startup seems kinda exponential the more nodes do it in parallel.

I’m pretty confident scanning 4 nodes on the same disk would take a lot more time than scanning one single node (for the same total occupied space).

And it could hurt your score or even disqualify the node if the disk were to stall to the point of taking more than five minutes to respond to audits.

Maybe it’s worth giving it a try, but only under heavy monitoring I’d say :slight_smile:

3 Likes

By mad coincidence, I did this test yesterday ‐ 9 nodes on 1 hdd, 5TB of data, pi4 4GB, file walk takes 36 hours.

3 Likes

Seems you do not like your HDD (or even hate) for some reason. :upside_down_face: Is it noisy or what the reason?

2 Likes

That’s why I simply avoid starting nodes at the same time. For upgrades I have a script that restarts them one by one and waits enough time for the scanning to finish. Not a problem.

3 Likes

Can’t believe nobody asked what happened to the node… What happened to the node? Why were you disqualified?

3 Likes

I needed one of the node’s disk for another project and ended up getting rid of all the odd little nodes. I wish storj would have filled the 12TB drive but it just hasn’t happened

1 Like

That’s the best approach for sure (if you have several nodes per disk I mean)!
It’d be cool if watchtower could do that :slight_smile:

1 Like

indeed sir it would.

That’s an option? I’ve got 3 nodes on an RPi (3 separate drives) and any time an update comes out I stop all nodes, remove the nodes, update the docker container, then restart all nodes by defining them in the run command. This is per the manual steps in the Software Updates documentation, although 3x over for the stop/remove/restart because 3 nodes. I know that the drives are busy when I restart but it doesn’t seem to be too intense… on the largest two drives I only have 1.54TB shared to Storj and neither of those are even over a third used at this time.

I guess my main question is, you can update the Docker container while a node is still using an old version? I guess the node wouldn’t use the new version until you stop/remove/restart? Once no more nodes are running on the old container version what will purge that container from disk?

Nothing. Use “docker images” to see the mess

1 Like

Thanks! Today I learned about the docker image prune command and reclaimed a little space!

So I’m assuming I can also safely run docker pull storjlabs/storagenode:latest while I’ve got my nodes running to pull down a newer image if one exists? And it will just sit there downloaded until I stop/remove/restart my nodes telling them to use the latest?

What do you mean? If you’re not using watchtower to auto-update your nodes, then you can update them in any order, and whenever you want as long as you don’t let your current running version become obsolete.

If you have 1 disk per node though, there is no inconvenience in updating all 3 nodes at the same time I guess: the system (i.e. CPU) should spend most of its time waiting for disks’ IO. Unless your RPi is too weak maybe I dunno, depends on the RPi model you’re running.


Correct.

Yup.

1 Like