Offline node email and Repair timeline

I’m not sure what’s the timeline after a node goes offline and how the repair works, maybe someone can enlighten me.
I got one of the nodes with lucky piece/pieces that had like 5 hours of downtime or maybe more; I was affraid that my lucky piece is gone, but nope. It survived.
So how long it takes for an unreacheable piece to be repaired on another node and what’s happening when you get back online? Is there a time limit that can put your piece back in production? Or once it’s repaired on another node, it gets deleted on your node with the next bloom filter?

Second, I’m not sure how the system is set, but I would like to propose these:

  • after 3 hours of downtime, you receive the email from the satellite.
  • after 6 hours of downtime, the requested/audited pieces get repaired on other nodes.
  • if you get back online before the db snapshot for the bloom filter is generated, the unreacheable pieces on your node becomes reacheable again and, if they are winning races instead of the ones repaired on other nodes, your pieces remain in production and the others get deleted.

Why those 2 intervals? Because when the node goes down, in most casese you don’t know untill you get the emails (we don’t count on third party software like Uptimerobot, here). When you get the emails maybe the node isn’t in your close proximity or you are busy at work or you have to start up some backup power supply or you need to move the system to another location (my case), etc. So you need a few hours.
And in most cases, in 6 hours any not so major power outage or programmed power line work is usualy solved. So deleting your hard earned pieces after only 3-4 hours of downtime is wrong in my view. And it shouldn’t start imediately after you get the email.
Why giving that second chance after 6 hours? Because maybe you realy are closer to the client and it’s in his advantage too, needless to say if you have some lucky pieces that you don’t want to loose.
What are your thoughts?

Why should the satellite request/audit pieces from offline nodes? Also I can’t rember audit score going down because of offline periods, even after 2 or 3 weeks… :thinking:

A segment is repaired only when its number of pieces drops below a threshold. There’s no specific duration of downtime that triggers repair, rather, it’s the collective behavior of all nodes that host pieces of a given segment that determines whether that segment will be repaired. As such, if a segment you host already had several other nodes offline, you dropping from the network is likely to trigger repair. For most segments you can probably count on enough other nodes being still online for non-trivial time.

Per V137.5 - high load? - #54 by littleskunk, the threshold would be 46 nodes out of 54, so 7 other nodes would also have to be online for yours to trigger repair.

On top of that:

  1. satellites will not notice you being offline immediatelly,
  2. per 1 of 2 nodes died - #10 by Alexey pieces are not considered unhealthy immediately after the node is found to be offline, but only after 4 hours.

As an example, I had some >2 week downtime last year, and when getting back online, I lost maybe 20% of the pieces, probably less. This with different RS numbers though.

This makes sense. I am less worried now. Thanks for the explanation. I believe Alexey explained it a few times also, but I didn’t rationalised it.