2 or more hdds for single node

vladro · December 4, 2022, 10:36am

It would be great if we will be allowed to attach not only 1 physical (or logical drive) to one node but 2 or more
It can be used to

increase speed of read\write\iops cap (dividing one blob between several hdds)
increase redundancy without using raid controller and software raids
use smaller hdds to increase total node volume (i, personally, have a lot of 500gb-1tb-2tb that are not used now, because it can’t be attached to nodes without creating a single volume)

All that will definitely reduce cost of ownership for node operators and allow to increase node volume more granularly without need to host 2nd (3,4… node)
Another option will be use second (3,4…) hdd when first is full- that would be much easier to code

Pentium100 · December 4, 2022, 8:11pm

Just use software RAID. There is no point for Storj to basically copy the functionality of md-raid and add it to their software.

There is nothing the node could do with multiple drives that cannot be done with just md-raid, zfs or whatever.

arrogantrabbit · December 4, 2022, 8:12pm

Essentially you are suggesting to reimplement mergefs or unionfs in storj

I don’t think it’s storjnode job to manage a filesystem, especially since you already can accomplish this with above solitons…

vladro · December 4, 2022, 9:35pm

instead of managing it on filesystem level or on block device level i suggest to give an option of multi directory installation storage way, just that.
from my unprofessional opinion its not that hard to allow user point on directory (that can be a disk too) like:

500gb d:\
2000gb e:\
200gb f:\1\

and not to forget- software raid under windows are unreliably (when we are talking about any raid without redundancy) and more important- with a simple code addition it will make storj more friendly for users (with 0 admin knowledge) and windows users
for example now if u hosting node without raid (software or hardware) and u wanna increase your node u have to install bigger hdd, clone and sync all the information, shortly bring node down for final synchronization and detach the smaller drive. adding 2(3-4) drive will allow to grow more granularly

Pentium100 · December 4, 2022, 9:46pm

Just set up a second or third node on those separate directories.

Your proposal is essentially RAID0 (but on file level). If one of the drives fail, your node will get disqualified for losing too much data.
Adding redundancy (making it RAID 1 or 5 equivalent) would complicate the code and not get anything new that was not possible with external tools (software RAID etc).

Those users should stick with one drive = one node setup, it will be simpler and less prone to problems. If you want a complicated setup it is possible to do so, but requires knowledge and moving the configuration to the node itself does not change that.

Or install a new drive and set up a new node on it.

The current options:

Set up a node for each drive (this is recommended by Storj).
Set up RAID (with or without redundancy) and run a single big node on that.

What can your proposed feature do that is not covered by these two options?

vladro · December 4, 2022, 9:54pm

second node- additional memory-cpu-device-power usage. second drive- just add drive.
node operators who are hosting nodes on single devices like raspberry pi or for server nodes- they don’t need this feature- i fully agree with that

Pentium100 · December 4, 2022, 10:03pm

CPU and power usage would be pretty similar for two nodes (on separate drives) vs one node (split on two drives). The node also does not use a lot of memory - it could be important for a RPi, but not a Windows PC.

It takes time to develop and test a feature. I’m sure Storj has better things to do (improving the customer experience etc) than making a RAID0 implementation especially since they do not recommend you run a node on top of RAID0 in the first place.

There is some kind of file-RAID for Windows that can probably do what you want. I remember using it in read-only mode to make files on 6 hard drives appear as if they were in a single directory. I do not remember how the software is called though.

vladro · December 4, 2022, 10:07pm

dynamic drives on windows and storage spaces for server OS.
it was just an idea that won’t be needed to implement in the near future but can be coded sometime &)
i personally host over 30 nodes and need this feature from time to time

Pentium100 · December 4, 2022, 10:16pm

Not that. I used something that created an aggregate drive. Basically, if I had F:\a.txt and G:\b.txt, both files would appear on X: drive, but I could also find them in their own drives. In my case F: and G: were network drives.

Unionfs does this for Linux.

Still running a node on that would make it less reliable, since it would essentially be RAID0.

Knowledge · December 9, 2022, 8:12pm

I think while there are workarounds to using RAID, we see a lot of SNO’s taking the path of least resistance and just setting up multiple nodes on separate hardware to use multiple drives. Your average user probably doesn’t understand what a RAID is or how to go about implementing it ideally.

And of course, this isn’t ideal unless they are using VPN’s or have multiple IP blocks to divide their nodes up on.

Perhaps down the road, it would make sense to adjust the software to support multiple drives for file storage. The database could simply know the location of where the fragment is at and manage it from there. Not saying this is a trivial add, but it would reduce complexity and allow for easier adoption of multiple drives.

Alexey · December 11, 2022, 3:51am

And get RAID0 at the end? Thanks, no. Such combination from the storage service and OS-related function is not viable - with one disk failure, the whole node is dead.

Pentium100 · December 11, 2022, 5:32am

Isn’t that the recommended thing to do? One drive = one node. While there have been discussions about using RAID to have one big node and some of us do it , AFAIK, the official recommendation always was one drive = one node.

One node using multiple drives is basically RAID0 (one dead drive → node fails 50% of audits) unless Storj would be willing to implement something like “file level” RAID5/6, which would have similar complexity in setting up. And then the additional feature requests would come (“I want to split my current node to 3 drive RAID5”, “I want to combine my 3 drive RAID5 node into one big drive”).

Pentium100 · December 11, 2022, 9:17am

Sorry for double post, but I just thought of something - maybe just make multinode setups easier? I do not know how easy or difficult it is now I would use another VM anyway).

A node operator wants to add another drive. Instead of doing a RAID0 or whatever equivalent, the node software just spins up another node on that drive. The operator will have to forward another port though, unless there was something like a “reverse proxy” for nodes.

vladro · December 11, 2022, 10:27am

its taking care about another OS and some additional recourses. and installing new node- its new vetting period

vladro · April 30, 2024, 9:25pm

regarding this article: Low io piece storage by liori · Pull Request #13 · storj/design-docs · GitHub
maybe using 2 or more hdd’s for single node can be a solution for iops issue

kocoten1992 · April 30, 2024, 10:16pm

But then everyone would have to run at least 2 hdds for a node? That can’t be a solution…

vladro · April 30, 2024, 10:35pm

nop- the idea was to give an opportunity to merge single node between 2+ hdds (like it can be done with hardlinks) and it can be done by allowing node operator to configure directory for every satellite blob independently
high iops and need of extra space are only an issue for bigger nodes (20tb+)
for smaller nodes- u won’t be facing this kind of problem normally

kocoten1992 · May 1, 2024, 1:33am

Run 2 nodes independently should share the load because it’s behind the same IP, am I missing something?

P/s: I think I missed the idea of running as a single node (instead of 2), but can you elaborate why?

arrogantrabbit · May 1, 2024, 1:54am

Generally, it’s the other way around. People have excess storage on their arrays that they chose to share with storj.

Their production system is built to suit their needs, and it’s a redundant array by necessity.

Then they decide to host a node.

The solution to lowering IOPS besides load sharing between drives is other techniques like avoiding sync, atime updates, and features like ZFS arc and special device.

Hdd on a raspberry pi soon won’t be feasible. It’s was a very short lived proof of concept thing.

vladro · May 1, 2024, 6:48am

running 2 nodes - is x2 resources for each node and node operator should create extra node(s) for the same ip when he creates first node or wait for vetting period 10+months