Yes, but then you would have logs and maybe then it would help both sides to understand what was happening.
It is likely that it will happen again, so why not unban him again, tell him to put logs on debug and not delete them and then look what is going to happen.
But it was clear that he does not have the logs anymore before you have unbanned him.
So only chance to be able to receive logs at all would have been to tell him that he was going to be unbanned and therefore he should stop deleting logs. And only after that unban him.
Because you see: He did not know that he got unbanned and therefore kept deleting the logs. No matter if the have any useful information or not. If they donāt have useful information you donāt lose anything.
If you plan to unban, give a notice. Thatās just fair. In other cases SNOs could be starting to delete files from the disqualified satellite. Then you unban him and he is doomed anyway.
To me it sounds like something went wrong during GC. Maybe the node was unstable (bad RAM, bad storage) and corrupted something in the GC that caused the node to delete more than it should. In either case, unbanning and giving notice wouldnāt have helped. The node lost data, so it would have been disqualified anyway. Since this is the only node out of 22500 active nodes that this happened, Iām very inclined to believe that it was something very specific with this node, and why Iām personally leaning towards hardware error.
That being said: Logging canāt be used currently because the node logs way more than it should. For example cancelled uploads (which is normal node behavior) gets logged as error. There is simply no way to watch the logs live and make any sense out of them. That means that even if OP wanted to troubleshoot this, he/she/it canāt do it live, and would still be banned while we all go through the logs.
Whatās faster? noticing error logs live, or wait for the day to finish so that you can react?
An example: When you restart a node on a slow drive: The node logs a database locked error, but it also starts spamming the log with useless order/upload/download cancelled errors. By the time you notice the database locked, a few seconds have passed.
Iām not saying that itās bad to see the locked error. Iām saying that important errors get logged as errors. Normal behavior gets logged as info. That way I (as an SNO) can pick what I want to log without needing any extra processing (logs donāt just vanish into thin air, they need to be processed/stored/rotated). For the life of me I canāt understand why certain logging decisions were made that way (with regards to storagenode).
He said he deletes logs every day.
So basically he is storing them but throwing them away after 24 hours.
He did not notice the original errors, so there are no logs for that.
But before he got unbanned he would at least have a chance to stop daily deletion and then run his node again. So he would keep the logs.
If the error does not show again, ok fine. If it does show again then he or Storj would be able to check if there is something about that in the logs.
I do not see a reason to do this live. If there is no error shown in the logs then he cannot do anything in realtime anyway. And if there is an error he can check with Storj later what the reason was and if it can be resolved and if they would unban him again if he is not on fault for the error.
So there is no reason to believe he would not be able to turn it off.
What is so hard about to ask a user: āHey, we plan to unban your node, can you turn off your log cleaning for a while so we might see the reason for the strange behavior?ā
No arguments on my side there. I completely agree.
How about instead of the logs being off in the first place, they were kept under control (by not spamming them with useless messages I can do nothing about to improve) and we now had a:
Critical: Got audit for a piece that was trashed on XXXX-XX-XX.
I asked about it before, and turns out that that only filters based on what logged it. It doesnāt actually change the facility/priority of the logs. Again, a useless feature if logs were actually implemented according to the relevant RFC.
Iām thinking of working on fixing the logging, but since Iām not a programmer, it may take a while.
Yep. So, enabling logs wouldnāt help, if the node wouldnāt manage to survive.
We wanted logs when it receives a BF and starting move pieces to the trash to get an idea, why itās deleted too much?
If it didnāt manage to survive, then itās useless unfortunatelyā¦
That would be to difficult to extract the reason, if the node has also corrupted or lost pieces during the way.