Success rate script - Now updated for new delete terminology since v1.29.3

Hi @BrightSilence , hi all,

I want to proper handle this issue in the script resp. in a copy of your script.

Currently audits are selected like this:

audit_failed_warn=$(echo "$LOG1D" 2>&1 | grep -E 'GET_AUDIT|GET_REPAIR' | grep failed | grep -v -e exist -c)
audit_failed_crit=$(echo "$LOG1D" 2>&1 | grep -E 'GET_AUDIT|GET_REPAIR' | grep failed | grep exist -c)

The issue is, that the new “audit failure” contains both, “exist” AND “failed”, but it is more a “warning” instead of a “critical” error. Example:

2022-04-03T10:14:55.159Z        ERROR   piecestore      download failed {"Piece ID": "32B7EYRW5CFKH3MHXQSPIX7XSE2IJK7H7TF3OJCVTY6DXBNNCJ7A", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "GET_REPAIR", "error": "used serial already exists in store", "errorVerbose": "used serial already exists in store

Do you have an idea how to properly mix the grep commands to properly categorise it? Something like:

WARN = grep "failed" | grep [not "exist"] OR ["exist" AND "used serial"]
CRIT = grep "failed" | grep ["exist"] AND NOT ["used serial"]

First, well, no easy way to do OR unless you use tools that might not exist on some environments (like bash that is modern enough to understand the >() syntax), but it’s easy to do two runs over the file:

grep failed | grep -v exist
grep failed | grep exist | grep 'used serial'

Alternatively you can employ awk or perl, so… ¯\(ツ)

Second:

grep failed | grep exist | grep -v 'used serial'
1 Like

Thx @Toyoo, within the script I’ll sum up the counts of the warnings to one single variable. I wonder why I didn’t think of that myself.

i don’t think we have really seen the serial issue earlier, that is most likely the main reason if it wasn’t accounted for in the script.

StorjLabs also haven’t had the greatest track record for log syntax :smiley: so…
but i duno… just stating my opinions based on some incomplete knowledge of the subject lol

1 Like

Grouping repair and audits together also wasn’t accounted for. I don’t see good reason to do that either. They are not the same process and I’m not even sure if containment is implemented for the repair failures. The original script still doesn’t have a problem because it doesn’t look for “exist” in repair traffic. I guess the easiest way to work around it would be to just grep “file does not exist” instead of just exist. But atm I see no reason to change the original script.

3 Likes

@BrightSilence I use excerpts from your script for my monitoring script. However, these specific audit alerts are incorrectly classified as critical and not, and I agree with you, only as a warning or notice. Therefore I would like to specify this in my script - but this does not require any change in the success rate script. I absolutely agree. Thanks for your support and feedback. @SGC @Toyoo

The distinction between critical and recoverable is made to differentiate between audits that count as an immediate audit failure and those that will be retried. As far as I’m aware that distinction only makes sense for audits, hence why that is the only category where the distinction is made.

I don’t necessarily agree that it is just a warning though. The transfer does fail, so failed seems like the correct label.

Agree - from a previous notice by Alexey I’ve grouped it and never thought about it anymore. Thanks to it, I was notified of the failures and could raise the defects, haha. Anyway, I’ll fix it and separate audit from repair handling. Thanks for clarification! :v:t2:

It’s still important to keep an eye on repair failures as they do impact your scores. Though it’s a little less clear to me what the exact processes are there and whether containment applies.

Agree, again. I’ll add an extra alert for repair failures to cover that as well. :metal:t2:

1 Like

me goes to checks his audit scores… lol

nah they are all fine… got some suspension scores that are a bit off, but don’t think its a real issue.
do have one node that uses like a few hundred of MB/s on it root drive which isn’t swap or anything like that…

not sure what that is about, been monitoring it for a few weeks… might try to reinstall the node to see if that fixes the issue… but not really related. lol

been looking for those serial error a bit in my logs, but thus far haven’t seen them…
but i am not logging all my nodes currently… so who knows… maybe they are the cause of my few drops in suspension scores.

but certainly cannot see any major issues and audits are 100% across the board.

same here. it’s happening once a day PLUS sporadically the other 3 error messages related to PUT_REPAIRs > but anyway, thanks to a lot of repair traffic, no real impact on the scores so far.

I try with version for windows but… (GitHub - AlexeyALeonov/success_rate: Success rate for storagenode (Storj V3))

PS C:\Program Files\Storj\Storage Node> .\successrate.ps1
At C:\Program Files\Storj\Storage Node\successrate.ps1:8 char:89

  • … a-color-mode=“auto” data-light-theme=“light” data-dark-theme=“dark” >
  •                                                                      ~
    

Missing file specification after redirection operator.
At C:\Program Files\Storj\Storage Node\successrate.ps1:171 char:126

  • … etails px-3 px-md-4 px-lg-5 flex-wrap flex-md-nowrap" role=“banner” >
  •                                                                      ~
    

Missing file specification after redirection operator.
At C:\Program Files\Storj\Storage Node\successrate.ps1:189 char:158

  • … ent=“true” class=“Header-link js-details-target btn-link”> <svg aria …
  •                                                             ~
    

The ‘<’ operator is reserved for future use.
At C:\Program Files\Storj\Storage Node\successrate.ps1:193 char:14

  •          ~
    

The ‘<’ operator is reserved for future use.
At C:\Program Files\Storj\Storage Node\successrate.ps1:436 char:31

  •     data-analytics-event="{&quot;category&quot;:&quot;Header&quot ...
    
  •                           ~
    

Unexpected token ‘{’ in expression or statement.
At C:\Program Files\Storj\Storage Node\successrate.ps1:436 char:46

  •     data-analytics-event="{&quot;category&quot;:&quot;Header&quot ...
    
  •                                          ~
    

The ampersand (&) character is not allowed. The & operator is reserved for future use; wrap an ampersand in double quot
ation marks ("&") to pass it as part of a string.
At C:\Program Files\Storj\Storage Node\successrate.ps1:436 char:53

  •     data-analytics-event="{&quot;category&quot;:&quot;Header&quot ...
    
  •                                                 ~
    

The ampersand (&) character is not allowed. The & operator is reserved for future use; wrap an ampersand in double quot
ation marks ("&") to pass it as part of a string.
At C:\Program Files\Storj\Storage Node\successrate.ps1:436 char:65

  • … data-analytics-event="{“category”:“Header”,&q …
  •                                                             ~
    

The ampersand (&) character is not allowed. The & operator is reserved for future use; wrap an ampersand in double quot
ation marks ("&") to pass it as part of a string.
At C:\Program Files\Storj\Storage Node\successrate.ps1:436 char:72

  • … ta-analytics-event="{“category”:“Header”,"ac …
  •                                                              ~
    

Missing expression after unary operator ‘,’.
At C:\Program Files\Storj\Storage Node\successrate.ps1:436 char:72

  • … a-analytics-event="{“category”:“Header”,"act …
  •                                                             ~
    

Unexpected token ‘&’ in expression or statement.
Not all parse errors were reported. Correct the reported errors and try again.
+ CategoryInfo : ParserError: (:slight_smile: , ParseException
+ FullyQualifiedErrorId : MissingFileSpecification

PS C:\Program Files\Storj\Storage Node>

You need either clone the repository or download a raw version, not the html version

2 Likes

is this normal ?
i have a new log file, from today only

PS C:\Program Files\Storj\Storage Node> .\successrate3.ps1
========== AUDIT =============
Critically failed: 0
Critical Fail Rate: 0,00%
Recoverable failed: 0
Recoverable Fail Rate: 0,00%
Successful: 0
Success Rate: 0,00%
========== DOWNLOAD ==========
Failed: 0
Fail Rate: 0,00%
Canceled: 0
Cancel Rate: 0,00%
Successful: 0
Success Rate: 0,00%
========== UPLOAD ============
Rejected: 0
Acceptance Rate: 0,00%
---------- accepted ----------
Failed: 0
Fail Rate: 0,00%
Canceled: 0
Cancel Rate: 0,00%
Successful: 0
Success Rate: 0,00%
========== REPAIR DOWNLOAD ===
Failed: 0
Fail Rate: 0,00%
Canceled: 0
Cancel Rate: 0,00%
Successful: 0
Success Rate: 0,00%
========== REPAIR UPLOAD =====
Failed: 0
Fail Rate: 0,00%
Canceled: 0
Cancel Rate: 0,00%
Successful: 0
Success Rate: 0,00%
PS C:\Program Files\Storj\Storage Node>

No, it’s not working. If you have redirected logs to a file, you need to add the path to the file after the name of the script. If you use a container name other than “storagenode” you need to edit the script and change that on line 6.

log is 1 day old

Your download is worse then my rpi4 and its a new log that doesnt look good.

what means that? is there any issue with my node?
fyi, my connection is with wifi (100/10 ftth line) and today i had a few restarts of pc for updates

Well first off, wifi really isn’t a good idea for a node. And second, 10mbit up is quite low (especially for ftth). Location could also have an impact, in what country is your node hosted?

But yeah, most important, use a wired connection.

2 Likes