Transfer escrow on disk failure?

alfananuq · October 13, 2020, 7:07pm

Hi,

An idea popped into my head after some reading, but I don’t know if it’s feasible.

I see “running one node per disk” being recommended often, and I like the simplicity of that. I’ve seen the argument of lost escrow if a disk fails vs having local data redundancy.

This sparked my idea of transferable escrow. Would it be possible to basically “apply” for a transfer of escrow from one node to another, maybe when applying for a new Auth token to replace the failed node, instead of being “punished” by something that’s most likely out of operators control, a disk failure? Since redundancy is already built into the service (if I understand correctly).

Maybe this isn’t in line with the purpose of the escrow, just thinking out loud.

Regards,
alfananuq

Vadim · October 13, 2020, 7:09pm

when you loosing data, your escrow is going to pay for lost data rebuilding, that redundency can be rebuilded.

alfananuq · October 13, 2020, 7:10pm

I was actually going to mention that as a guess, good to know, thanks!

anon27637763 · October 13, 2020, 7:16pm

This doesn’t make logical sense because the escrow is spent on repairing the data on the network. The held back earnings is meant to ensure network data security. It’s not meant as a reward for longevity on the network.

alfananuq · October 13, 2020, 7:28pm

Yeah, I figured the purpose of the escrow was something like that.

SGC · October 13, 2020, 7:50pm

the reason i run with redundancy isn’t really because i’m worried about the held amount / escrow…

really disks will fail or have errors and eventually nodes will have errors in bad spots, which can cause all kinds of issues… maybe in a good deal of years when every disaster have been tried and complained about 1000 times, it wouldn’t a problem… but until then … the main advantage i see is that i don’t have to contend with random issues caused by data errors… like malformed databases and ofc the held amount not being lost is a bonus…

but really… is it really… i mean if you do the math, the escrow is like 2-3 months of earnings.
and half is paid back at month 15, so that leaves it at 1-1½ months worth of earnings…
so if a node can survive just two years you are down to that being like 4-6% of the total…

so a disk being expected to run for lets say like 5 years, if case of enterprise with no expected failure, that would put it at like less than 2-3%

but ofc in most cases there will be a bit error here and there for whatever reason… ofc enterprise gear usually is pretty good at catching that…
also when it’s taken 9 months just to get to the point where a storagenode will give 100% payout, then running it without redundancy seems a big risk… the escrow / held amount is basically just an investment so you can be allowed to profit, the time spent getting the ingress and vetting and all that…

TheMightyGreek · October 14, 2020, 6:19am

A solution would be to keep an eye on the disks health and copy the data to another hard drive once it gives the first signs of failure. You would maybe lose a couple months of the disks lifetime but you get to keep the held amount.

SGC · October 14, 2020, 7:01am

it’s actually quite rare that disks with bad smart just dies… they often just keep running fine for a long time… predicting disk failures are quite difficult… the whole smart system is to improve odds for being able to actually getting data out…

it’s kinda like your mechanic tell you that your car is going to fail… doesn’t mean it will, anytime soon anyways… and he is far from always right… and he will be much better at that than smart is at predicting failures… i’ve seen more drives keep working for digital eons, than i have seen them just die shortly after predicted failure…

but high workloads, does seem to make the odds of a drive dying much higher… but it’s a science and i don’t have enough data points to really form any reasonable opinions…
but thats my opinion on it…

also if one has like a 8 drive raid6 or a 4 drive raidz1 / raid5 (with checksums) then one only losses 25% to redundancy…
oft the odds of having a disk fail goes up by a factor of 4, but then one can replace it in most cases…ofc the question then becomes if replacing a failed drive costs more than 25% of it’s life…
maybe not… but still that like 1 year if we imagine a 4 year lifetime… and most good drives should last longer than that… ofc there is the whole increased odds of failure on new drives…

so i would say it’s very close in regard to whats lost and saved on replacing prefailure vs running with redundancy and replacing disks post failure.

for storagenodes i think running just one drive is just asking for pain long term… ofc its a great way to start… but eventually there will be errors, and random errors at that… which is just annoying to deal with…

TheMightyGreek · October 14, 2020, 7:15am

that’s actually a very good point. My small odroid HC2 is running good for now but I’ll eventually get around to building a PC that has redundancy.