What about some kind of “reset” mechanism? The node wipes all the data from that satellite and the satellite pretends the node is brand new (vetting process etc).
EDIT:
Instead of having to create a new node after disqualification from one satellite, the operator could, after fixing whatever problem caused the disqualification, run some command (similar to how GE is started) to inform the satellite that he would like to start over. The satellite then treats the node as if it was new (vetting phase and so on).
The only difference for the operator would be that he would not have to run two nodes - the old one (for the satellites that did not disqualify it) and a new node for that one satellite that did disqualify the old one.
i think that’s a great and very sensible idea… one could also easily imagine the network having 20+ satellites… and getting DQ from one or more sats eventually would lead the having to GE the entire node if one wants optimal performance out of one node…
having the option to reset for each satellite makes perfect sense… i also think it’s a flawed idea to just keep getting new identities… after all node reputation should be worth something…
might be a bit of a niche case… but having the option makes sense…
It will be reset to zero after disqualification. But if you want to made your reputation history to work, then satellite will know that your node was a bad actor in the past and do not offer pieces to it. At least I would do that on my satellite if I were a satellite operator.
running zfs with ecc ram periodic scrubbing of ram, and 3x vdev of x3 hdd raidz1
so unless if i don’t replace broken drives, then data loss is highly unlikely… and since all the drives are only 6tb max then it’s rebuilt in a jiffy…
i also run with a Slog and sync=always on the storagenode to improve my read latency and limit fragmentation of db loads / making random writes more sequential.
so yeah… losing a byte is highly unlikely… the more likely thing would be if the entire system died at once because i don’t have proper surge protection installed on it’s power bank…
have crashed my server experimenting with gpu passthrough to vm and paravirtualized graphics… maybe 30-40 times over the last like 4 days… maybe a week… and yet it didn’t drop a byte anywhere… going to run a scrub of the pool now to confirm that tho
storagenode logs looks good tho… never looked better i think…
i really should get a proper extension cord … power bank or whatever people call that in english with all the fancy filters, fuses, and such safety to ensure i atleast have a half decent chance of surviving something coming through the mains…
i thought about it for emergency shutdowns… but the power here is so ridiculously stable that it’s basically a wasted expense … may do some solar… try to move the server to mainly run of solar power… and then i would or could quite easily make such a system into a ups or work with a ups like hardware…
so yeah… its down to the economics of it… a ups in my case doesn’t make much sense… solar on the other hand could help bring down operational costs… ofc thats also an expense, but battery tech has been getting so much better recently…
last time the power was out was like 7 -8 years ago maybe 10… and now that most powerlines are dug into the ground, it’s only become even more stable… i also plan to shield the server room and switch to a fiber connection at one point…didn’t think that all the way through when i hooked up twisted pair into the server room…
so yeah eventually a UPS type setup will be the correct path… but not what i need right now not where i live… however solar… would be a good idea because we have very high electrical prices here…
The issue is if a node is Disqualified, it was so for a reason.
Just wiping it and resetting the stats might not solve the underling issue with that node and a few weeks/months later you’re back to square one with the node being DQ’ed and wiping it again.
It might help if the DQ / Suspended notification came with some sort of indication why it was suspended or DQ’ed (not sure if it does, since it’s not happenend to me… yet…)
Then the user can be sent to an automated portal, where the user can confirm they have fixed / looked at the issues that caused the node to be kicked before letting a wiped node back on the network.
This is exactly the same when you generate a new identity and request an authorization token to sign it. This is similar to suggested confirmation.
But then we back to the current scheme
Whatever the reason is, a new node can be disqualified just as well for it. It does not matter if the identity is new or old for that.
However, right now, if one satellite DQs a node for any reason, I have to start a new node specifically for that satellite (because there is no point in wiping the whole old node if it isn’t DQ on the other satellites).
OK, let’s say the hard drive is failing and that was the reason for DQ. OK, well, I either know it or I don’t. If I know it, I move the data to a new drive and reset that satellite, if I don’t know it, then I would start the new node on the same drive and have the same problem.
No need for a portal, the node could send something to the satellite to indicate “let’s just start again from scratch”. It should not be automatic, the operator should run some command to do it (after fixing whatever problem).
I think this is the key to making this work. This gives the operator a chance to restart, without forcing it before the issue is fixed.
If there is a manual step involved it really is effectively the same as starting a new node, which an operator could also do without fixing anything. But that’s what the new vetting phase will be for. I really like this idea.