I’m not sure what’s the timeline after a node goes offline and how the repair works, maybe someone can enlighten me.
I got one of the nodes with lucky piece/pieces that had like 5 hours of downtime or maybe more; I was affraid that my lucky piece is gone, but nope. It survived.
So how long it takes for an unreacheable piece to be repaired on another node and what’s happening when you get back online? Is there a time limit that can put your piece back in production? Or once it’s repaired on another node, it gets deleted on your node with the next bloom filter?
Second, I’m not sure how the system is set, but I would like to propose these:
- after 3 hours of downtime, you receive the email from the satellite.
- after 6 hours of downtime, the requested/audited pieces get repaired on other nodes.
- if you get back online before the db snapshot for the bloom filter is generated, the unreacheable pieces on your node becomes reacheable again and, if they are winning races instead of the ones repaired on other nodes, your pieces remain in production and the others get deleted.
Why those 2 intervals? Because when the node goes down, in most casese you don’t know untill you get the emails (we don’t count on third party software like Uptimerobot, here). When you get the emails maybe the node isn’t in your close proximity or you are busy at work or you have to start up some backup power supply or you need to move the system to another location (my case), etc. So you need a few hours.
And in most cases, in 6 hours any not so major power outage or programmed power line work is usualy solved. So deleting your hard earned pieces after only 3-4 hours of downtime is wrong in my view. And it shouldn’t start imediately after you get the email.
Why giving that second chance after 6 hours? Because maybe you realy are closer to the client and it’s in his advantage too, needless to say if you have some lucky pieces that you don’t want to loose.
What are your thoughts?