Redundancy vs no redundancy

hoarder · February 12, 2021, 3:10pm

It’s not quite correct. Say you have 3x8TB drives in multiple locations and want the max amount of profit, in 5 years you’ll benefit from redundancy (raid5/z1) only if drives are so bad that you lose 5% of drives every year and never replace them. And if you do replace them then failure rate ends up being close to 10% per year. I think even those 3TB seagates were better.

ACarneiro · February 12, 2021, 5:10pm

Oh, God not this discussion again!

Pac · February 12, 2021, 11:24pm

I kinda get your (long ) point @SGC.

But in the case of an Isp box, I can’t really add any redundancy to it

Rebooting it regularly might be something to try out. Not sure how or if there is an option for that… Or maybe just with a timer socket

DanyaSWorlD · February 13, 2021, 1:30am

After setting up several hosts for storj i have some bash scripts and ready to copy-paste (with small changes) commands. So setup a new host with any ammount of nodes (hdd’s) will take less than day. Now i’m asked my friend to rewrite all those in one shiny app
As for now i have to manually reformat and fstab drives and generate identities. Everything else done by autogenerated docker-compose.yml

SGC · February 13, 2021, 10:22am

well thats where we look at it very differently…
i would regard enough bitflips to cause a program failure or other inconsistency in the node operation as a failure.

on the other part of it, you might have gotten a bad batch of hdd’s and 80% die in 2 years, but okay lets keep it on more realistic numbers, it still takes 15months to replace a lost node or more with our current ingress… sure you can go smaller hdd’s but then you also just increase the electrical bill.

so lets use the one you mentioned 5 years of run time, of a fleet then with a failure rate of 10% to make it rather high but for ease of math.

so if we say 20 nodes, and 10% failure so two node failed and 10% loss, if we translate that into run / storj time
then that would be 5 years x 20 hdd of storj time so 100 years of storj time, out of this the 2nd year we subtracted to cover the costs of the disks, that means 20 years subtracted leaving 80.
and then the 1st year we basically disregarded and called power costs, internet and whatever else… cables costs and what not.

so thats 40 years we can subtract from the total without profit, leaving us with 60 years… then we have two disk failures which is 10 years of most profits , lets just say they never really got profitable as they would have atleast to run for 2 years to pay themselves off and thus the hourly wages then we are over half their life and so basically zero…

then we are down to 50 years out of the 100
then you need to subtract your wages, the housing, but lets disregard that.

so if we say mirrors the math will most likely look even worse.
we have twice the disk costs, twice the power costs, ofc no failures of storagenodes because thats to unlikely to happen in this example, i think it’s about less than 1 in 100000 disk years for a mirror to fail

so lets just subtract the first 3 years in costs… year 3 for extra disk purchases years 2 for power costs
and year 1 for misc, that leaves us with 40 of the latter years for profit, vs 50

and now i know what you say, but look i still win using 1 drives especially because mirrors only half the capacity … well if you fill up the drives… ofc
which these days is likely to take years for like say 12-20tb hdd’s infact they might never fill at this pace…

which would stretch the earnings further towards or after the 5 year warranty point.
so really the whole idea with the mirrors are that i know my nodes will go beyond that point… the whole point is that no node will die, it will be redundant and get no damage at all thus it will run predictable forever basically, meaning you can set it and forget it… while without redundancy you will have to use time on it… and how much do you really gain long term…

trouble is what you gain… and some advantage in capacity over time… but you will just end up paying for that in work…
how many hours can you afford to use on each hdd over it’s 5 year lifespan…
due to bit errors, if we assume a standard hdd lets say 4$ pr stored tb in earnings
8tb hdd so 32$ a month when its full … i forget what the numbers we used previously was so lets just say for 8tb the usual 3 years of profit. so 36 months of 32$ so something like 1152$ full earnings over 5 years of run time.

so for that 1152$ you nee to keep it alive, 5 years being 1825days … do you look at it every day to check it’s alive… because you only have less that a dollar worth of time to spend…
every other day… then you got 1.5$ worth of time for it… getting better…

so lets do it the other way around then…
1152$ as hourly wages… so lets say a wage of 50$ an hour that gives me 13 hours of time for the node over a 5 year period… not a lot of room for error… infact the disk itself would only have cost much less than that 8tb x 25$ so 200$ for the hdd… 400$ for a mirror

really in the end the issue will be the problems one have to deal with… so i want it to run error free, you cannot convince me that you can run single hdd’s error free and problem free for 5 years.

and if you do think that, then i’m sorry to have to tell you… you are delusional.
hdd’s are basically micro magnetic LP’s… they are terrible, lying, noisy workers and should never be trusted…

sure most people don’t have an issue… they think… you know why your computer is weird sometimes… because it makes mistakes… because consumer grade computers aren’t made to run without making mistakes…

but time is to valuable, so one uses redundancy…
else one has to get up and fix stuff at night… instead of just checking the notification and rolling over and continue sleeping…

ofc it happens our schedules are affected… but it will be 1/100th of how often it will happen to people that don’t use redundancy.

SGC · February 13, 2021, 10:28am

been thinking of making something like that… haven’t really gotten around to that yet… not sure if i’m going to either… i mean how many nodes am i really ever going to run…
the one advantage would be when one expands years later and it’s all automatic which makes it nice and easy… ofc by then something will have changed and the scripts most likely won’t work… and then its either fixing those first or doing it manually again…

i get the point… just not sure it really is going to be as beneficial as i like… but maybe… sure are some stuff i wouldn’t mind automated.

if you don’t have anything useful then just repower it once a week manually and see if that doesn’t fix the problem…
ofc a switch is more reliable, but a notification on a phone will prove if the fix works

nevetsyad · February 13, 2021, 11:36am

The problems is, no data or activity on nodes. I set up a 16TB node, update it at least weekly. I’ve been stalled at 1.6TB of data for a month or two. Just sort of bobs around there, with constant 30-50GB of trash, matching the intake.

Then factor in no pay in a few months. Just kind of depressing.

hoarder · February 13, 2021, 12:33pm

Why do you think managing nodes with non-redundant drives will take more time? It’s the opposite in fact, as if non-redundant node dies you can just say “oh whatever” and forget about it, meanwhile with a redundant node you better visit the location and replace the drive, or in case of parity raid, you must do that or you get a really high performance penalty.

I won’t tell you that a hard drive will run for 5 years, but I can tell that out of 10 drives 8 will make it. Look at statistics published by Backblaze and you’ll see average annualized failure rate for their drives never went above 2.5%, and for Q4 2020 it averaged at 1.58%.

SGC · February 13, 2021, 2:06pm

a drive failure is not the only thing that can adversely affect how much work you need to put in to maintain a node, lots of other factors come into play, drive failure is just one part of it… redundancy aim’s to make everything redundant and thus with the utilization of checksums the system can maintain near perfect integrity.

a fail drive is much more rare than a corrupted drive, on top of that your example of backblaze is flawed due to backblaze vetting their drive models in smaller batches before large scale implementation.
and thus their numbers will be biased towards the best possible outcome.

also by what measure would you consider a drive failed… drives more often than not start putting out bad data rather than directly failing.

if it takes 3 years for a node just to get well into its good earning phase, and 4+ to fill up at 20 tb hdd, then having drives only surviving 5 years offsets the possible earnings, because it takes so long to build up a big node…

so if we say we have a 10% annual failure rate on our drives, which i don’t think is totally unreasonable, sure it can be better, but i prefer to be positively surprised.

then out of our previous example of 20 drives there would be two failing drives/storagenodes failing each year, which takes 3 years to get up to capacity again… if not 4… but lets say 3 as we expect ingress to be going up over the years.

so 3 years are basically wasted everytime one starts a new node and with 10% failure rate, than that would mean 6 nodes years required every year… so 3 years to get back up to speed during each year another two drives or 6 drive years would accumulate so thats a total of 18,… maybe this metric doesn’t work for this…

scratch that…

we have 20 drives / storagenodes running in their prime
10% failure rate a year… so
year 1 a total of 2 storagenodes go down and is replaced
year 2 another 2 fail
year 3 another 2 fail
year 4 another 2 fail - the 2 that failed during year 1 is now back in making profits

so basically you would always at a 10% failure rate have 6 storagenode out of 20 that was “recently” recreated. which aren’t earning at their capacity, and thats assuming it will only take 3 years to fill them.
for drives like 20 or 24 tb drives which would be the most energy efficient, it would take 4-5 years to fill them.
making the number go up to 8-10 storagenodes continually working towards getting filled.
it’s not because i care about each individual drive, its because it takes a long time to regain a profitable node and when a node is full it is a peak earner rather than new nodes which are basically almost unprofitable for a good deal of years.

i know we until recently had tons of stress test data to rely upon, but this is doubtful to be so long term.
you may think your solution is efficient, but i’m not convinced… if i thought it was worth doing i would be doing that instead…

sure i do plan to make some competing nodes, maybe run new nodes on single older drives.
but aside from that, i want my storagenodes to survive because of the profitability that comes with that.

redundancy also solves issues like bad controllers, bad cables, bad servers, bad internet, bad ram…
and i need my beauty sleep gosh damnit…
my 6 month old node looks like this currently

so lets say it’s been getting full ingress for 4 months, it’s 2 weeks away from being 6 months
thus we can subtract vetting. it’s up to 950gb.
so 3TB in a year at full ingress, so a 24 tb hdd takes 8 years to fill up at the current pace.

sure if ingress was a lot faster single disks might be the way to go… but you would end up putting in a lot more hours and odd and inconvenient times, and worry about monitoring a lot more.

also annual failure rates are not consistent, they depend on server room temp / temp flux, humidity, disk workloads, vibration… maybe a bit of pure luck… vibrations and shockwaves, sometimes disk just start to run into a invisible wall of time where drives will just not survive it due to wear patterns.

i have drives that will work for storage and data bursts… will read all day without fail… and when i put a storagenode on them… one of the drives just craps out totally.
been playing a bit around with that because it’s kinda interesting… that it works fine… unless under continual workloads.

now if that wasn’t in a mirror setup, it would have had to have been trashed, because i would never have been able to figure out that it actually works and test it without the risk of loosing data.

i use mirrors so really it gives me more options, then if i wanted to take a node with me, like say if i have a colocation and the internet is down… and i don’t know if it will get back up… then i can take 1 mirror with me and leave the other… if the internet is fixed at the colocation the storagenode can go back online without it’s mirror.
and if it doesn’t i can spin it up from a secondary location… so long as i take care not to run the same node from two locations at one time ofc lol

else it doubles my iops, reduces the workload on the hdd’s, allows for fastest recovery speeds in case of disk failures, and provides the greatest level of redundancy… i know some will argue that raidz2 or even raidz1 is better… and sure in some aspects, but they also have many drawbacks… and really even with a mirror you can have errors on both hdd’s and they would have to correlate on both drives for data to be damaged…

also i have mostly been running 2nd hand hdd’s … so doesn’t exactly improve the reliability, not that new drives are that reliable… like a new combustion engine it’s best to wear them in first, before being mean to them

hoarder · February 13, 2021, 3:49pm

In your example we have 70% nodes working at 100% capacity. Meanwhile if you have raidz1 with 3 drives you have at most 66.6% capacity because you have to give up 1/3 of drives for parity. Or 50% capacity with mirrors. And that’s before we add nodes that were recreated and are below 100% and before we make a correction for nodes that failed before they got to 100% (which means we lose less that one full node worth of income). With unrealistically high failure rate and assuming that nodes could not be salvaged from drives that still work.

Once again, what I’m trying to say is that statistically speaking SNO who runs without drive redundancy will get more than SNO who prefers a redundant setup. If SNO has enough nodes, then uncertainty is eliminated and non-redundant setup will earn more.

SGC · February 13, 2021, 7:04pm

if you could fill the storagenode’s fast enough then yeah sure…
the problem as i see it is that the storagenodes are much more important than the drives, as the drives represent a small part of the total value over time. + the added erratic work with having no redundancy.

one could just run 5 drive raidz1 then it would win in that example, and you might see the failure rate as high, but with the used drives i’ve been getting 10% might be a bit low…

also the 2% drive failures doesn’t include cable issues and such, on top of that the research one will need to put in to be sure than one actually get’s drive reliability isn’t a small feat either.

if you did check out backblazes many yearly reports you would also see that some years they have had reliability issues, which made them introduce their drive model vetting methodology, so that they would have a better idea about the reliability of the drives before buying hundred or thousands of them.

they also tend of spread out their drive choices over a large range of drives to be sure issues with one model doesn’t affect their overall setup…

this is again something that is very difficult for SNO’s since the setups are much smaller, thus they will need to research their purchases of the drives before hand, and attempt to get the exact same models as drive reliability numbers across models are not always easy to obtain.

i do plan to run test nodes of many different kinds of setups, but i’m fairly confident that redundancy will win, largely due to the savings on work time involved with maintaining nodes over time.

something like having ecc memory might make a larger impact in the long term than most would expect… it’s not for fun most enterprise gear uses ecc.

littleskunk · February 13, 2021, 9:56pm

Oh fun. I had a hard drive failure recently. I was unable to read a bunch of pieces from my drive. Every time the storage node tried to access this bad sector the drive will go into 100% io mode and stop responding.

I was able to move the entire node on another drive except that one bad sector. The node is still operating. So losing a drive doesn’t mean you lose the storage node or the data on it.

Pentium100 · February 13, 2021, 10:06pm

Yeah, but it’s likely. You ot lucky that the drive developed only a few bad sectors and in a way tat got very obvious. If there were no indications of a problem before suddenly the drive just did not spin up (or produced a lot of IO errors), then you would most likely be sitting with a brand new node, waiting a year or more for that node to fill up again.

And that’s what concerns me and why I use redundancy (6-drive raidz2 vdevs). If a node fails, it’s not just that I lose the held amount, but that the new node will be empty and will take over a year to get back to the amount of data (and egress) my current node has right now.
Oer the past month, ingress to my node was abour 1.2mbps, so, it would take 77 days to fill up 1TB (assuming no deletions) and two years to fill up 10TB.

Pac · February 13, 2021, 10:21pm

Been there, did that

But I agree with @Pentium100, we kinda got lucky because we noticed the problem before it killed the node. Many don’t monitor nodes close enough to catch that in time. And having to start from scratch really isn’t encouraging…

Yeah but, there are deletions unfortunately so it’s been even slower than that lately…

hoarder · February 13, 2021, 11:33pm

Well, if you need years to fill a small 8tb node then the whole idea is flawed to begin with. So yes, I assume that there’s always some more or less constant amount of ingress. If it takes 2 years to fill a node and you need 3 years to roi on a drive then what’s the point of running it? 4 years you spend paying off hardware, then 1 more year you get some profit and then you’re out of warranty and good luck?

Not really. You can often notice first signs of drive issues long before actual data loss happens. And even if you go by reallocated sector count it’s still very likely that node survives that.

Over past 3 months free space on my nodes was consumed at 11.7GiB/day on average, deletions included. That’s 87 days to 1TiB. On the other hand we had periods when nodes were getting more data every hour and it lasted weeks.

Pentium100 · February 14, 2021, 3:17am

It deally depends on the failure mode. A head crash can come with no warning and take out the drive completely. An electronics fault could happen. I have also seen a drive fail because its motor seized up. IIRC that parcular drive worked just fine, until I turned that computer off and let it cool down. After that, it was difficult to spin the disk even with pliers.

If I had set up a new node a year ago, it would have filled up faster, but there’s no guarantee that Storj would just start uploading lots of test data again.

andrew2.hart · February 14, 2021, 8:01am

At the current tiny ingest even backups become viable

penfold · February 20, 2021, 4:07am

Well, if I had been following storj recommendations I would have lost my node last night. One of the SAS Raid 1 mirror drives died last night. Admittedly it was a used drive - but it had worked for me very hard over the last year on multiple projects. Instead of losing my node all I needed to do was pop in a new replacement. Admittedly I/O latency went up a lot during the rebuild process but that is a hell of a lot better than the alternative.

kevink · February 20, 2021, 6:03am

Yeah as long as you have more drives available than storj data, redundancy makes sense as it comes “for free”. Only if you have to choose between redundancy and more data, more data might make more sense. But the less ingress we get, the more precious redundancy might become.

hoarder · February 20, 2021, 10:21am

What you did not lose was uptime score and a bit of time, not node itself. And if you have a hardware raid controller, then there is a huge chance that controller decided to drop the drive because of a couple of read errors or something.