Crazy idea sanity check: From unraid parity array to simple 6 stand-alone disks! Please check with me the pros/cons

Hey everyone,
I have unraid array with 1-parity + 5x 8tb disks.
Current I am running 2 nodes, but as soon as they are almost filled, the plan was to expand and have it in all 5 disks, so 5 nodes in total eventually.

Today, I woke up with a very crazy idea, please follow my dream:

First context: From the 5 disks, I only have 20TB of data. This data is movies (accesible by my plex) and photos. For the sake of this explanation, let’s assume it is all from my camera, so I control and I input all data. So as you can see I have lots of free space.

My plan:

  • Kill the array totally and will not use it any longer.
  • I will create 6x cache pools with just a single disk inside.
  • Each disk will run one Storj node with 3TB availability, so i will have 6 disks running at all time with 6 nodes. (that is the target).
    How about my data and redundancy?

Since I control my data and the way the data is stored, I will create a 3 main folders and i will split my data cross this 3 folders. The goal is to have the 3 folder where each one will have ~5TB data allowance.
Then I will have a rsync script that will just backup the data from disk1 to disk2.

Results/Goal breakdown:

Disk1:
Node1 with 3TB
MyDataFolder1 with 5TB or less, does not matter, 5TB is the limit

Disk2:
Node2 with 3TB
MyDataFolder1 BACKUP

Disk3:
Node3 with 3TB
MyDataFolder2 with 5TB or less, does not matter, 5TB is the limit

Disk4:
Node4 with 3TB
MyDataFolder2 BACKUP

Disk5:
Node5 with 3TB
MyDataFolder3 with 5TB or less, does not matter, 5TB is the limit

Disk6:
Node6 with 3TB
MyDataFolder3 BACKUP

Conclusion: I will have 6x Nodes running 3TB each, and I will have 3 folders where I can organize myself and still have 15TB with one disk redundancy!

How crazy is that?
The main reason here is:

  1. Multiple nodes, to provide redundancy, in case one disk dies, I will have the other nodes, I believe this is Storj way of doing this.
  2. unraid array, the parity is a dead disk that is just spinning all time and bottleneck everything.
  3. My data is not growing, I have the same 10-12TB for long time now. In case I need more data, I can always add one more disk or kill one of the nodes.

Any thoughts on this?
Sorry for the long text, but I had to share with someone.

Use mature tools for that. LVM allows you to have volumes with different redundancy schemes on a single array, even in many cases convert them on-line from one to another.

1 Like

For that I will have to leave unraid ecosystem right?
I was trying to avoiding going naked server (GUI helps a lot)

Sorry, I don’t know unraid that well.

So I used to do something like you, I would have the unraid array and multiple disks, running one node on each disk (i also had previously also used the share functionality to spread the files over all disks, don’t do this lol). I have instead done this:

~3 drives in the array, 1 for parity and 2 disks for data. The remaining drives are all single disk pools, each disk runs one node.

The benefits are: if your data is relatively cold, 3 of your disks can have their heads parked and you can save on energy. You can build manual scripts to create copies on the other disks. You still get the benefit of parity of your precious data.

Your plan is also fine, you don’t need to put these drives in their own pools to do this, just a 6-disk partity-less array. The downside is that all your disks will be spinning, and with more nodes that means more RAM and CPU usage.

1 Like

I think it would be better to just use one disk in unassigned devices.

1 Like

How many nodes can I have in single disk before performance degrades?

I know is not supposed to be this way, but for the first months, 8tb is too much for one node.
I want to start a few to count the venting months as fast as possible.

In the past I had 2 nodes running per 16 TB disk with no problems until the disks were full. During vetting period there is much less traffic so I guess you could start your 5 nodes on just one disk.

But you know, it is against TOS. :wink:

Generally its better to have 2 nodes instead of 1 with a backup. You get double the useful capacity. The Storj network already handles redundancy so all you are doing is halving your available capacity. In the case that you lost a drive without redundancy, you would still have a full drives worth of storage.

It depends on network conditions and your node’s configuration (cache/hashstore). Currently as the network is rather calm, I wouldnt expect many issues from running two nodes on one HDD.

Having said that, on HDDs, I wouldnt recommend more than 1 node per drive. IOPS on HDD’s are a major bottleneck. Instead, if you want to start running multiple nodes for the sake of vetting, your best option is to throw them on an SSD.

Don’t start vetting on more than one node. It will multiply the vetting time. Unless they changed that, the traffic also splits for unvetted nodes. An year ago, it used to take like 2-3 weeks per node.

1 Like

for storj, definitely, one node per disk. multiple disks means multiple nodes.

for managing personal data this could get tricky.

My “home data” is basically spread on three different drives, which also have some storj data on them. Since it’s just three drives it’s not hard to manage manually.

managing more data spanned across multiple disks would require something fancier. LVM, mergerfs, unraid, etc.

But you would only want to do any disk spanning or parity for your real data, not for storj.

But it also hides many options from the user. With plain vanilla linux server you could split all disks into 2 partitions, using first partition for a storj node each and build a zfs raid from second partitions.

Not crazy. I’ve always avoided large arrays. They’re too difficult to get out of, riskier (parity is not backup), less efficient (lots of media I don’t need to backup) and slower (for parallel I/O like unzipping large archives or StorJ).

In recent years I JBOD/Raid0 pair of drives so management is simpler and better sequential speeds. Currently I have 3x pairs of drives + 2 individual drives (where StorJ is). Only one pair is a Backup destination. I don’t split until I need to.

However with the slow growth of StorJ on a single ip I wouldn’t bother creating 5 nodes. I have 2 nodes now, and I think I’ll keep this forever – I’ll just upgrade to larger drives in the next 10 years and be fine. Much simpler, and I can dedicate SSD cache for those only.

1 Like

SSD cache is not needed anymore. I don’t see any benefits in using it. I run 2 nodes on 1 NAS with 8GB RAM, all the nodes files are on each one’s drive. No problem at all. Gains as much data as an Ubuntu server with 32GB RAM and db-es on M.2 drive. Logs are at minimum, though.

1 Like

Yes perhaps. I still have memories from test data period when I tried to migrate 8TB along with constant I/O & garbage collection - it took weeks. Might never happen again but it’s cheap and already in place using a portion of my OS drive.

And what will happen when this disk would die?

I hope you are aware that RAID0/JBOD means zero reliability? So using it for your personal data doesn’t sounds good, unless you also have a backup copy on Storj.

You can replace it like any other failed disk in a RAID array without data loss. The node (partition 1) is lost of course.

Raid0 is not zero reliability vs single drive, just more likely to fail. But a good backup solution should handle failure at any time so it doesn’t matter much. I still follow the 3-2-1 backup rule, yes!

1 Like

I know that’s referred as with “zero redundancy”, but I wanted to emphases the outcome - with one disk failure the whole dataset is lost. Usually such arrays are used for not important data but with increasing speed and/or capacity.