Find a way to stop the node on audit errors

@BrightSilence
That awk command is pretty useful…

was looking into what was causing suspensions.
looks like we might be able to ignore the other one and use this instead… then we don’t have to look for multiple errors, ofc that wouldn’t protect against the node sending wrong data, but it’s a start i guess…

Which would make the command look something like this right?
remember i’m pretty green at this, i managed to misspell grep… lol
while fumbling around to figure out this… from what i can see on my logs i don’t have any of this entry and it relates to timeout of an audit when it passes over 5minutes and the sat gets salty…
it was my out to get around the whole looking for multiple errors in a row, apparently this can kill a node in short order…

might be a bit rough so i’ll be happy to hear better suggestions before i start to dig into testing a variation of this that will actually hit on my logs and thus do a realworld test before using this error i hope to never see…

docker logs -f --tail 20 storagenode | awk '/('context deadline exceeded; piecestore:'/ {system ("docker stop -t 300 storagenode")}'

did a grep on my 1month log on
context deadline exceeded; piecestore:

got no hits, might be other errors tho, but i’m pretty handicapped at all this still xD
did start with deadline, but that hit on satellite timeout when the network connection was down… was hoping to find something a bit simpler… but this log line seems pretty unique, maybe i should check the forums if it’s used somewhere else…

yeah doesn’t seem to appear on the forums either… so lets trust that littleskunk knows what to look for, i just need to know if i wrote the command some what right… lol

tho it’s pretty harmless so i should just test it out…with some more common log hit…

is there a way to add the below part to the docker run parameters?
so that it will start after a stop and start of docker, so it would auto run after a reboot or such.

logs -f --tail 20 storagenode | awk '/('context deadline exceeded; piecestore:'/ {system ("docker stop -t 300 storagenode")}'

if so then it would be a rough patch for the storj documentation recommendations.

well tested it, didn’t work… guess i need to start reading up on just how to do that awk commandline thing… instead of a couple of pathetic copy paste attempts at modifying it…

from what i can tell this should work…

docker logs -f --tail 20 storagenode | awk ‘/(download.canceled/ {system (“docker stop -t 300 storagenode”)}’

it’s basically the same thing that @BrightSilence suggested… i guess it didn’t work :smiley:

i cannot get it to trigger… the docker stop -t 300 command
ofc i would need to use the error.deadline.piece:or whatever it was exactly…
testing the function works… which it doesn’t at presently…

but ill keep at it.

Your suggestion doesn’t have valid regex to match anything. Also the example you used for the error is different from the one you get when the mount doesn’t exist. You would get something like file doesn’t exist. There are multiple ways you can fail an audit. I don’t think it’s a good idea to match only a specific one. For what it’s worth my nodes haven’t had any audit failures in many months. If you’re going to try this I would recommend stopping on all audit errors as they are likely all pretty serious and an indication that something is wrong.

Alternatively you could also monitor the dashboard API and kill the node if your audit score drops below 0.9. that should be early enough to prevent it from going below 0.6.

i like that idea, that would also be a lot less resource dependent, ofc it would require a script to run in cron or such which means it’s not easy to impliment into the storj docker run documentation.

but would work fine for me since i already have a cron script running on my node every minute.
and that would then also run after a node or server restart.

how would a bash command that track the dashboard API audit score look?
if i may be so bold… assuming that you are a bit familiar with that…

I’m gonna leave this one to you, but you can use curl to get the data from the API endpoint and jq to parse through the json. Enjoy!

:smiley: yeah the problem is i need to refer to a dictionary just to decode what you just wrote… hehe
did manage to find the API post on the forum, but seems like i might need to install jq whatever it is… xD

for a temporary fix, for a safeguard that i doubt i will ever need…
but im sure ill learn a lot along the way lol
maybe ill see where this vote goes before throwing more time at a custom solution.
it just seemed so close.