Well month was just a suggestion, but I can elaborate a little bit on my thinking for that. Younger nodes without much data don’t get that many audits and if it also takes a few days to diagnose and fix the problem, there may not be enough audits / repairs to recover the score above the threshold, especially given that it drops a lot faster than it recovers. On the other hand, while the node is suspended the data is already protected, so there isn’t really a need to rush permanent DQ. So I figured a longer period would spare SNOs from losing their node in those situations and spare the support staff from having to deal with more tickets regarding disqualified nodes that will inevitably come. There’s a balance to be found here.
I would say this is again a good reason to not use too short a period. It also give you time to fix things software wise if necessary, without permanent impact and the need to manually deal with loads of support tickets.
This is also where data could prove valuable. How often do nodes spend more than 30 days in suspension? And those that do, do they ever tend to come back from it now? Because long suspension seems costly if you’re still paying them.
And of course there is that elephant in the room, that doesn’t need to be repeated. But this could (and probably eventually will) include nodes that really SHOULD be disqualified despite returning unknown errors only for different reasons.
All of those options sound reasonable to me. But from user reports here on the forum I can tell you that loss of ingress and data loss to repair is already acting as a good incentive for SNOs to want to get out of that state. Additionally the fear of permanently losing a node that built up reputation and data over time works as a very strong motivator as well.
If I could pick I would prefer avoiding egress by excluding these nodes in node selection rather than letting egress happen but not paying for it. Seems a bit more fair and also eases load on nodes that may have gotten into trouble for too high a load to begin with. I think it’s also quite fair to not pay for storage for a node that has not been reliable. This would also perhaps help offset the additional repair cost this triggers on your end. It wouldn’t make much sense to pay for data storage when the node has shown to be unreliable and is actively causing repair costs on the network.