I had a disk issue on my node which caused the entire disk to go into read only mode. I did a manual fsck and rebooted the machine. It is online now for over a day now. The suspension % is still at 100% and no bandwidth is currently being used. How long does it take before the suspension is removed and the node becomes a contributing member of the network?
If the suspension score is 100% on dashboard, that’s mean that it’s not affected.
Perhaps your node were suspended for offline?
What’s your current reputation on the dashboard? Is the node online and updated?
Since the disk was read only, so I am pretty sure the node would have been offline as no logs were writable. The node is online now and running the latest version.
From what I gather from the forum, your node needs to be online for 30 days before it starts receiving any data. If you are suspended again in the 30 day window, it takes another 30 days.
As long as you don’t see a “disqualified” message (don’t know how it looks), it is just a matter of waiting.
Update: Node was up for more than 30 days with no downtime, node was disqualified anyway. Ended up scratching the whole instance and reinstalling. Should have done this when I got the suspension notice, would have been one additional month of the new instance.
Suspension is happened in two different cases:
- When the suspension score go below 60%
- When the online score go below 60%
These two metrics are completely different.
The suspension score is affected when your node is online, answers on audit requests, but returns unknown error instead of piece. So, suspension score is affected when you have problems either with disk or with data or in rare cases other hardware issues. It fall when audit was unsuccessful with unknown error (known errors like missed or unavailable or corrupted piece will affect the audit score directly). The recover is possible, if the node starts to pass audits without errors.
The second metric is affected when your node is offline or cannot answer on audit requests for any reason. It recovers when your node become online for the next 30 days.
The first one is dangerous, because the next stage is falling audit score and disqualification. Disk problems cannot be fixed with being online unfortunately. They cannot be solved at all if data is corrupted. There is only a small chance, that node can survive after data loss or corruption.
Thanks Alexey. I don’t believe there was data corruption as the fsck didn’t report many errors (a couple of shards were corrupt if I remember correctly). The data would have been readable… but no logs were written, so I don’t know if this makes the app think the data is corrupt/not serviceable.
All in all, a lesson learnt to monitor the node more closely.