Plans for bigger single nodes beyond 24Tb?

thepaul · June 27, 2019, 8:24am

Windows users should be able to use Storage Spaces instead of LVM for the same benefit.

I don’t know what solutions exist for MacOS. Perhaps just a hardware disk array? But I understand that the percentage of SNOs who want to run on Mac desktop systems and also have a need to scale their storage seamlessly is expected to be pretty low.

ryan · June 27, 2019, 2:23pm

Yeah, most Mac users (and most users in general) would probably end up sharing a single USB external drive. You can pick up an 8TB drive for $140 now so that’s a cheap way to go that will allow you to share a lot of data.

My home plex server runs on a 2012 Mac Mini with a thunderbolt 2 array. It’s still running Storj V2 but adding more storage to it isn’t something I really want to take on.

aeonaura · June 27, 2019, 5:38pm

Resilience is going to play a major part with SNO for longevity.

Having a raspberry pi with 8TB drive is great while it works, but if you filled that drive up and the drive fails. Doubt you could get 6TB of data in the 99.3% uptime window. Which you would have to start over with your investment.

Best practices for resilience would be running: hardware raid, ZFS or storage spaces (storages spaces direct)

Hardware raid has reached it’s max potential. While this is the cheapest option you also have risk of losing the entire array due to file corruption. However, running let’s say a Raid 6 (dual pairty) you would have at least 2 drive to hot swap during a drive failure. You lose raw data space, but gain resilience. When a drive fails and you swap it with the new load on the array to rebuild can take time and risk to others drives failing from the stress of adding the new drive.

ZFS is fantastic for resilience, the major drawback is you need system memory for the system checks (to keep files from being corruption) The typical recommend system memory is 1GB Ram per 1TB of Raw storage. Not only do you lose raw space to run a Raid-Z2 (same as raid 6) but you have added cost for memory.

Storage Spaces (Storage Space Direct)
This is windows version of ZFS, however instead of system memory you need cache. This can be in the form of Ram as persistent memory (optane) or SSD/NVMe while running several different raid options like raid 5/6/10 and many more on the Direct side.

The beautiful part of cache is that the brunt force of IOPS is on the cashe not the HDDs. This makes the longevity of the HDD last much longer on lighter loads.
Making this the most expensive, but more resilient.

-This is just my opinion

aeonaura · June 27, 2019, 5:42pm

Then there is SD-Wan / dual power supplies / gen transfer switches / generator / UPS / VRRP Routers / dual switches

This all helps with with the uptime.

Cost / Income / Risk level

stefanbenten · July 1, 2019, 12:15pm

Quick addition to this part:

To take entire files of the network you dont need 51% of the capacity on your machine. Its enough to have enough nodes to take the file below its minimum threshold (where is mostly gone and unable to be repaired). In our defaults right now, 80 pieces should be stored in the network of an file/segment. With an minimum of 29 pieces, you can just have bad luck, if a bad actor holds enough of these.

Storgeez · July 1, 2019, 8:12pm

You’re just required to have enough pieces of the same file in one location to make it a SPOF. This will get resolved with uploading to geographically diverse addresses, I think it has been mentioned somewhere.

aeonaura · July 3, 2019, 1:55am

The 51% was meaning was TOTAL used data on the entire Storj network.
(if this was what you were pointing at)

If there is 100Pb of used data on the Storj network and you hold 51% of that (51Pb of data) you are holding several shards that are the redundancy. Let’s use the pieces 29 pieces COULD have 15 pieces at one location.

To accomplish this, (using the same scale above) you would need 213 (212.5) Nodes all with 24TB of data and a /24 of Public IPs to assign each node to keep receiving shards as if they were separate nodes.

There is ways to blacklist a block of IPs, so you could get even more creative and get /30s of several different blocks and have a completely unique IP block to every node.

This is just my theory. Wonder what would happen if you took 51% of the network offline.
Would it still operate? Looking at mass outages, like the recent cloudflare one. Granted it wasn’t cloudflare fault, it was verizon on not doing security on BGP connections. Which ended up taking down services for cloudflare.

aeonaura · July 3, 2019, 2:10am

One thought on fixing this, is allowing no max on how large a node can be and having a safeguard allowing no more than (x) shards/pieces of the same can be assigned to that node.

stefanbenten · July 3, 2019, 5:57pm

That is already implemented

stefanbenten · July 3, 2019, 5:59pm

We are already having IP Limiting in place. Your attack is more or less impossible. The typical 51% attack is not really applicable on our network.

tyakimov · July 6, 2019, 10:42am

Cool, have your IP limiting in pace, so that multiple node IDs cannot be run behind the same IP, but I believe that not allowing people to run clusters of small nodes that belong to the same node ID and pushing them into doing useless vertical scaling on a single host is going to be far more devastating for the longevity and storage capacity of the network.

I am going to be using 4xSATA ARM devices with 1Gbps ports and 20TB of mirrored storage for my setup. I am not going to start spending ridiculous amounts of money on 10 year old hardware like some people in this thread. In case of full node failure my data and system RTOs are going to be in the range of 10 minutes for replacing a node from cold boot. And I would like to have a fleet of those puppies.

The power of being a storage node operator in 2019 is not using 10 year old datacenter hardware with dual enterprise-grade power supplies and RAID, its the ease of use with which configuration management can be applied on your hosts, its the OS and application-stack imaging capacities you with which you can bring your system to a desired state in less than 10 minutes, its the automation you can applied on your systems, which can be joined in clusters under the plethora of available technologies.

Cut off commodity hardware from the picture, push people into using useless vertically-scaled hardware, which would almost under any circumstances eat up most of their profits, and you have yourself a recipe for an extremely stagnated growth path.

Alexey · July 6, 2019, 11:02am

This is a best way to be disqualified in a short time. Each server with the same identity will have only part of the data. When satellite comes to audit your server, it can not find the needed piece, because it’s on other server with the same identity, so, audit will be counted as failed. For each your server with the same identity. More servers - faster disqualification.

However, if you would use the shared storage between all your servers, it could work. But it’s just not needed for Storj network - it’s a best load balancer itself.
For the Storage Node Operator it’s not profitable - even with a different identities your nodes will be treated as the only one node, so they do not receive more traffic than only one node. Also, you will have downsides of multiple setup - you can read more there: Add multiple drive, restart node with different parameters, and reduce Storage with out penalty - #14 by Alexey

This is interesting approach, but your mirroring must be faster than audits. Because if your mirror on current node is not consistent with others, you will fail the audit request.

I can suggest you to try to setup a Kubernetes cluster or Docker Swarm with a GlusterFS storage on your fleet of puppies. It could be a working solution.

tyakimov · July 6, 2019, 11:43am

Honestly, this is the 2nd time you’re giving me the lesson of why I shouldn’t be running the same node ID on multiple hosts. Did I ever say I was doing that ?

I am just going to take a step back from all of this, my apologies.

Clearly, after seeing all of these discussions, you’re trying to force a particular design upon your storage node operators, meant to maximize geo node distribution at the cost of SNO ops costs.

Clearly, you’re OK with 74TB-sized SPoFs in your network. It baffles me which non-corporate entity would remain alive in the long run, given you’re putting weight on single nodes over clustered systems.

Alexey · July 6, 2019, 1:50pm

This is not only for you. This is a public discussion, so I want to make it clear for everyone.
Someone with a little experience could try to repeat your setup without knowing of all edge cases and will fail.
You are right, you can use a cluster setup, if you will prevent any audit failure. But it could be costly for SNO.
We should figure out a best setup for the SNO.
One node per SNO with a simple setup looks like a more cheap solution because of low costs to handle it.
If you could build a multihost solution with low costs - then it would be great!

Pentium100 · July 7, 2019, 7:00am

IMO a cluster, even made out of Pis would be better than a single device. Reason is reliability. 5 hours allowed downtime pretty much requires having spares in location and immediate action in case of failure.

However, since this is targeted at home setups and not datacenters (where you could reasonably expect someone to be nearby at all times, especially for the big datacenters and yet, if a client asked me to set up a system with 99.3% availability, it would definitely have everything redundant) it is reasonable to expect that something may fail while the SNO is away (at work, on vacation) and cannot get to the hardware in time. So, it would be great if the node could survive failures while unattended.

A cluster (and storage can be a cluster too) would help achieve that. The alternative is a HA setup, but that requires hardware that supports virtualization and does not protect from the OS messing up (kernel panic etc).

If multiple node instances could use shared storage and a shared database, it would be nice for those who want the extra insurance.

BrightSilence · July 7, 2019, 7:19am

While I’m not going to argue that those setups will work, it’s highly unlikely a normal home user is going to run anything close to that. And it’s also not necessary.

Right now my setup is simple. I user uptime robot to monitor and give alerts on my phone. And I have VPN + SSH to connect to the node if anything is wrong. Now I’ve had an issue with my hardware freezing and becoming unreachable once. But I’m thinking of maybe just plugging it into a smart plug so I can force a reboot remotely as well. (I know this is not ideal, but I had to do a hard reset anyway)

With that setup, I can always get it back online within the 99.3%, without any need for clustering or HA, both of which in my opinion really don’t add up cost wise as a reasonable solution.

Pentium100 · July 7, 2019, 7:47am

I sometimes am unable to do anything even if I go an alert. For example driving to some other city - I cannot just stop and try to bring the node back online during that time. I also would rather not have to wake up in the middle of the night for this.

So, my goal is to make the node survive any failure. I get the alert and can take my time fixing the problem, provided that nothing else fails during that time. A cluster would also help with performance, especially if the CPUs are not the fastest.

Alexey · July 7, 2019, 11:01am

I can imagine such system:

Smart plug
Some watcher software/site
Watcher pings the rpi3, if it’s not answering in time - unplug it via smart plug and plug it back.

It’s not ideal, as @BrightSilence said, but it’s something.
I have a home server, which can do this job, since it’s online anyway.

thepaul · July 13, 2019, 2:21pm

A point which might be getting lost here is that a “node” can be a cluster. “Node” does not necessarily mean “single device”. It does mean reachability by way of a single IP address at a time.

One large setup might include a single host as a proxy, which terminates the SSL connection and passes on the query to any of 32 different storage servers, depending on what satelliteID+pieceID is being sent or retrieved or queried. Or it might prove worthwhile to have two dispatching servers instead of one, and use an HA mechanism to failover between them in case of faults. (I don’t mean to imply that the storagenode software from Storj supports configurations like this right now— just that it could be done, and will hopefully be supported directly at some point.)

thepaul · July 13, 2019, 2:23pm

(My apologies if that was already clear- I just wanted to clarify the point for future readers.)