The node wipes data after disqualification, the satellite wipes reputation and node can start over

Pentium100 · July 2, 2020, 8:43am

What about some kind of “reset” mechanism? The node wipes all the data from that satellite and the satellite pretends the node is brand new (vetting process etc).

EDIT:
Instead of having to create a new node after disqualification from one satellite, the operator could, after fixing whatever problem caused the disqualification, run some command (similar to how GE is started) to inform the satellite that he would like to start over. The satellite then treats the node as if it was new (vetting phase and so on).
The only difference for the operator would be that he would not have to run two nodes - the old one (for the satellites that did not disqualify it) and a new node for that one satellite that did disqualify the old one.

SGC · July 2, 2020, 9:05am

i think that’s a great and very sensible idea… one could also easily imagine the network having 20+ satellites… and getting DQ from one or more sats eventually would lead the having to GE the entire node if one wants optimal performance out of one node…

having the option to reset for each satellite makes perfect sense… i also think it’s a flawed idea to just keep getting new identities… after all node reputation should be worth something…

might be a bit of a niche case… but having the option makes sense…

Alexey · July 3, 2020, 8:12am

I can separate it as a new idea, if you want

Alexey · July 3, 2020, 8:15am

It will be reset to zero after disqualification. But if you want to made your reputation history to work, then satellite will know that your node was a bad actor in the past and do not offer pieces to it. At least I would do that on my satellite if I were a satellite operator.

Pentium100 · July 3, 2020, 8:37am

In that case do it

SGC · July 3, 2020, 8:58am

that is an assumption, one could get DQ for other reasons that having corrupt data… i doubt my setup will ever loose a byte…

ACarneiro · July 4, 2020, 9:56am

Quite a lofty claim there…

SGC · July 4, 2020, 10:10am

running zfs with ecc ram periodic scrubbing of ram, and 3x vdev of x3 hdd raidz1
so unless if i don’t replace broken drives, then data loss is highly unlikely… and since all the drives are only 6tb max then it’s rebuilt in a jiffy…
i also run with a Slog and sync=always on the storagenode to improve my read latency and limit fragmentation of db loads / making random writes more sequential.

so yeah… losing a byte is highly unlikely… the more likely thing would be if the entire system died at once because i don’t have proper surge protection installed on it’s power bank…

have crashed my server experimenting with gpu passthrough to vm and paravirtualized graphics… maybe 30-40 times over the last like 4 days… maybe a week… and yet it didn’t drop a byte anywhere… going to run a scrub of the pool now to confirm that tho
storagenode logs looks good tho… never looked better i think…

 ========= AUDIT ==============
Critically failed:     0
Critical Fail Rate:    0.000%
Recoverable failed:    0
Recoverable Fail Rate: 0.000%
Successful:            3281
Success Rate:          100.000%

byte me ? xD

ACarneiro · July 4, 2020, 10:19am

So what you’re saying is… if it goes, it’ll go with a bang!

SGC · July 4, 2020, 10:24am

i really should get a proper extension cord … power bank or whatever people call that in english with all the fancy filters, fuses, and such safety to ensure i atleast have a half decent chance of surviving something coming through the mains…

ACarneiro · July 4, 2020, 10:29am

You can get a fairly cheap UPS that’ll do that trick and even keep the node running during short power cuts…

SGC · July 4, 2020, 10:42am

i thought about it for emergency shutdowns… but the power here is so ridiculously stable that it’s basically a wasted expense … may do some solar… try to move the server to mainly run of solar power… and then i would or could quite easily make such a system into a ups or work with a ups like hardware…

so yeah… its down to the economics of it… a ups in my case doesn’t make much sense… solar on the other hand could help bring down operational costs… ofc thats also an expense, but battery tech has been getting so much better recently…

last time the power was out was like 7 -8 years ago maybe 10… and now that most powerlines are dug into the ground, it’s only become even more stable… i also plan to shield the server room and switch to a fiber connection at one point…didn’t think that all the way through when i hooked up twisted pair into the server room…

so yeah eventually a UPS type setup will be the correct path… but not what i need right now not where i live… however solar… would be a good idea because we have very high electrical prices here…

anon27637763 · July 4, 2020, 1:52pm

Perhaps one of these

… …

S0litiare · July 4, 2020, 3:16pm

The issue is if a node is Disqualified, it was so for a reason.

Just wiping it and resetting the stats might not solve the underling issue with that node and a few weeks/months later you’re back to square one with the node being DQ’ed and wiping it again.

It might help if the DQ / Suspended notification came with some sort of indication why it was suspended or DQ’ed (not sure if it does, since it’s not happenend to me… yet…)

Then the user can be sent to an automated portal, where the user can confirm they have fixed / looked at the issues that caused the node to be kicked before letting a wiped node back on the network.

Alexey · July 4, 2020, 3:50pm

This is exactly the same when you generate a new identity and request an authorization token to sign it. This is similar to suggested confirmation.
But then we back to the current scheme

SGC · July 4, 2020, 4:07pm

thats kinda what the logs are for… i suppose it is kinda stupid that i keep my logs on the same pool as the node, now that i think about it…

yeah that would also be one way to go… it would give a bit more control… ofc then there are the people that would forget…

Pentium100 · July 4, 2020, 7:26pm

Whatever the reason is, a new node can be disqualified just as well for it. It does not matter if the identity is new or old for that.

However, right now, if one satellite DQs a node for any reason, I have to start a new node specifically for that satellite (because there is no point in wiping the whole old node if it isn’t DQ on the other satellites).

OK, let’s say the hard drive is failing and that was the reason for DQ. OK, well, I either know it or I don’t. If I know it, I move the data to a new drive and reset that satellite, if I don’t know it, then I would start the new node on the same drive and have the same problem.

No need for a portal, the node could send something to the satellite to indicate “let’s just start again from scratch”. It should not be automatic, the operator should run some command to do it (after fixing whatever problem).

BrightSilence · July 5, 2020, 11:30am

I think this is the key to making this work. This gives the operator a chance to restart, without forcing it before the issue is fixed.
If there is a manual step involved it really is effectively the same as starting a new node, which an operator could also do without fixing anything. But that’s what the new vetting phase will be for. I really like this idea.

Alexey · July 5, 2020, 11:45am

@Pentium100 if you agree - please, clarify your idea in the first post more specifically.
Especially a reason and conditions to start over.

Pentium100 · July 5, 2020, 6:30pm

I have tried to edit the first post to explain this in detail. Does that look more correct?