Success rate script - Now updated for new delete terminology since v1.29.3

BrightSilence · March 10, 2020, 9:07pm

I’ve updated the success rate script to include both canceled and failed as separate values. There are now percentages for both as well. I also fixed an issue where critical audit failures weren’t counted as such due to a change in terminology.

The updated script can be found here (For linux, windows version below)

The new output has a lot more info and looks like this

Please note this screenshot was made on a log from mostly the version prior to the update. This is why there are high amounts of failed transfers. After the update to 0.34.6 or later, failed transfers only occur if there was some sort of error. Not if a transfer was interrupted because enough piece were transfered.

The windows version was also updated by @Alexey and can be found here

You can also find an alternative HTML output version of the script above here:

github.com

AtomicInternet2/Powershell_Scripts/blob/1afbad030afc000cfc6c7a96ade963e3f519a0bd/Storj/Success_Rate_To_HTML.ps1

############################################################################################################################################
# This file was converted from the repository https://github.com/AlexeyALeonov/success_rate to output HTML instead of text.
# I only take credit for the HTML output, Alexey created the logic behind all of this
# Thanks to Alexey for creating the Powershell success rate script
############################################################################################################################################

$log = Get-Content "C:\Program Files\Storj\Storage Node\storagenode.log"

$auditsSuccess = ($log | Select-String GET_AUDIT | Select-String downloaded).Count

$auditsFailed = ($log | Select-String GET_AUDIT | Select-String failed | Select-String exist -NotMatch).Count

$auditsFailedCritical = ($log | Select-String GET_AUDIT | Select-String failed | Select-String exist).Count

if (($auditsSuccess + $auditsFailed + $auditsFailedCritical) -ge 1) {
    $audits_failed_ratio = $auditsFailed / ($auditsSuccess + $auditsFailed + $auditsFailedCritical) * 100
    $audits_critical_ratio = $auditsFailedCritical / ($auditsSuccess + $auditsFailed + $auditsFailedCritical) * 100
    $audits_success_ratio = $auditsSuccess / ($auditsSuccess + $auditsFailed + $auditsFailedCritical) * 100
} else {
    $audits_failed_ratio = 0.00

This file has been truncated. show original

WARNING: the HTML output version above wipes the logs after running for a daily snapshot for performance.

The terminology used in these scripts is in line with the logs and is always from a customers perspective. This means that uploads are uploads to the network and ingress for your node. Similarly downloads are egress for your node.

kevink · March 10, 2020, 9:16pm

Any chance you might add the delete operation to your script? not neccessarily to know the percentage of succesful deletes but just for stats.

BrightSilence · March 10, 2020, 9:23pm

I could add something, but the logging is less detailed for deletes. What would you like to see exactly? From what I can tell only “deleted” and “delete failed” can happen. I could add percentages as well.

kevink · March 10, 2020, 9:26pm

The amount of deletes would already be sufficient. Basically just as a nice statistic next to all the other operations.
The “delete failed” should hopefully not occur often in the future.

BrightSilence · March 10, 2020, 9:43pm

Sure, might as well include it though.

Orange since any failed delete would be caught in garbage collection anyway, so it’s not a big deal if they don’t go through.

Updated version is on github.

kevink · March 10, 2020, 9:48pm

Awesome! Thanks a lot!

tankmann · March 10, 2020, 11:05pm

Great job, my node just updated some hours ago, updated the script and also started logs in a new file (kept the old ones). Works on both nodes, good job. Will update the comparison thread once I have at least a day of data in. @BrightSilence - thanks as always!

Your’s perform a bit better (old data before update), think it’s as always my bad ADSL connection

dragonhogan · March 11, 2020, 12:19am

Love it. Thanks so much!

mi5key · March 11, 2020, 12:30am

Any way you can incorporate the specific breakdown of which satellite is cancelling uploads? Something like this raw output, translate to satellite name, with prettier output.

docker logs storagenode 2>&1 | grep ‘“PUT”’ | grep ‘upload canceled’ | awk -F" " {’ print $11 '}|sort | uniq -c

 1 "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW",
224 "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6",
250 "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S",
452 "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs",
 15 "1wFTAgs9DP5RSnCqKV1eLf6N9wtk4EAtmN5DpSxcs8EjT69tGE",

If I have time I may work on it and submit a pull request. My upload success is in the %25 range and are awful, trying to figure out if I’m having issues with everyone, or what.

peem · March 11, 2020, 6:55am

Thanks so much BrightSilence

Pentium100 · March 11, 2020, 8:33am

I have this script I use for my monitoring system:

github.com

Pentium100MHz/storjv3-tools/blob/a6f453a335e6c5da410ac691223805c7132505e4/actions.php

<?php

$resultfile="/opt/mon/actions2.json";
$lastrunfile="/opt/mon/actions2.lastrun.txt";

$lastrun=trim(file_get_contents($lastrunfile));
exec("/usr/bin/docker logs storagenode --since ".$lastrun." 2>&1",$op);
file_put_contents($lastrunfile,time());

$totalsread=file_get_contents($resultfile);
$totals=json_decode($totalsread,1);
$totals_old=$totals;

foreach ($op as $line) {
	$parts=explode("\t",$line);
	$origin=$parts[2];
	$severity = preg_replace('/\e[[][A-Za-z0-9];?[0-9]*m?/', '', $parts[1]); //remove terminal color codes
	if (($severity=="INFO") && ($origin == "piecestore")) {
		$result=str_replace(" ","_",$parts[3]);
		$json=json_decode($parts[4],1);

This file has been truncated. show original

You run it every once in a while using cron or whatever, it counts up the various log entries and produces a json file with the results (successes, failures etc per satellite and total). It only uses the log entries that were created since the last time it was run. It runs on php 7, probably would run on different versions as well, I did not test it.

I am using it to have graphs like this:

Seems that Stefan’s satellite does a lot of repairs, but not much else.

Cross91 · March 11, 2020, 8:38am

Is it possible to update the windows script as well?
Would be great.
Thanks.

BrightSilence · March 11, 2020, 10:25am

Windows version found here

It’s managed by @Alexey, perhaps he could look into that, but he may have other priorities. Anyone could update it though and send a PR which I’m sure he would merge.

Alexey · March 12, 2020, 12:08am

updated. Should work

BrightSilence · March 12, 2020, 6:28am

That’s awesome! Thanks, I’ve added the link to the top post.

Rabinovitch · March 13, 2020, 2:18pm

Why not to include these data somewhere in Storj node web dashboard?

BrightSilence · March 13, 2020, 2:26pm

I don’t think this data would have priority. More important right now would be to add info on payouts and held back amount. After that, I would love it if they added it somewhere. But they would have to figure out a better way to store this information. Currently we crawl through logs to find it, which is far from ideal, but it’s all we have.

Success rates are interesting to know, but in many cases there is not much you can do about it and most SNOs would be fine never knowing them. This is more for us tinkerers who like to keep a closer eye on things.

Pentium100 · March 13, 2020, 3:58pm

I agree. This data is accessible in the logs, while the held actual amount etc is not, so I would prefer more information becoming accessible at all over making the same information a bit more accessible. Parsing the logs is annoying, but it still works.

Rabinovitch · March 13, 2020, 5:06pm

So isn’t it better to do both? Everything is accessible in the logs, but… The more friendly and informative is Storj software, the greater future the whole Storj network will have. Don’t you think so?

Pentium100 · March 13, 2020, 8:17pm

However, there is limited time to do stuff, so, if there is not enough time to do everything, then it would be better for me if they made more information accessible at all.