TL;DR
if one fails an audit, the from what the successrate.sh suggests, then it retries laterā¦ it would make sense that the uptime might make use of thatā¦
so that if you drop an audit it just tries again shortly after or however it worksā¦ should be part of the audit recovery kinda thingā¦ because the whole point was to improve performance and not add additional workloads to the network, thus if there is a recovery of failed audits, making that as a recoverable uptime check should be ātrivialā atleast in theoryā¦
[ramblings and reasons]
well itās for tracking uptime, a node that cannot be contacted is still kinda useless for the network, if a customer wants access to their dataā¦ granted @Pentium100 you are rightā¦ by one point it wouldnāt matter to much just before the network went almost silent a few days ago, i was up to getting like an audit a minuteā¦ granted my storagenode isnāt well representing a stressed one hdd node, but even in that case i donāt think after the first month or so that lack of audits is a real issueā¦ ofc it would depend on network traffic, which is a downside, but when the network is at fulltilt the performance benefit from utilizing audits for tracking uptime should be worth while.
and no matter how long the downtime is then repair jobs would be started and thus one would be punished by the existence of more copies of the data one had.
right now on the network a single audit might also represent a long while for most nodes
i got 1000 audits in the last 10-11 hoursā¦ that was actually more than i would have expectedā¦ thats nearly equal to my uploads at 1160.
maybe we should find some low performing nodes thats like 1 week - 2 weeks or so oldā¦ maybe a month and see what their numbers of audits actually areā¦
1k in 10 hours is beyond 1 a minute, thats pretty decent tracking and then if the system just has a certain error tollerenceā¦ tho i do have 0 failed audits, ā¦ better post the successrate ā¦ so much easier.
my node is 9 weeksā¦
$ ./successrate.sh /zPool/logs/storagenode_2020-05-16.log
========== AUDIT ==============
Critically failed: 0
Critical Fail Rate: 0.000%
Recoverable failed: 0
Recoverable Fail Rate: 0.000%
Successful: 966
Success Rate: 100.000%
========== DOWNLOAD ===========
Failed: 24
Fail Rate: 0.428%
Canceled: 24
Cancel Rate: 0.428%
Successful: 5564
Success Rate: 99.145%
========== UPLOAD =============
Rejected: 0
Acceptance Rate: 100.000%
---------- accepted -----------
Failed: 1
Fail Rate: 0.065%
Canceled: 379
Cancel Rate: 24.610%
Successful: 1160
Success Rate: 75.325%
========== REPAIR DOWNLOAD ====
Failed: 0
Fail Rate: 0.000%
Canceled: 0
Cancel Rate: 0.000%
Successful: 58
Success Rate: 100.000%
========== REPAIR UPLOAD ======
Failed: 0
Fail Rate: 0.000%
Canceled: 267
Cancel Rate: 22.723%
Successful: 908
Success Rate: 77.277%
========== DELETE =============
Failed: 0
Fail Rate: 0.000%
Successful: 351
Success Rate: 100.000%
it seems very much like audits rarely failsā¦ but ill check my logs and see what the worst one is and
15-05-2020 - 13k requests, 2k cancelled, 109 failed downloads, audits 2k 1 recoverable audit failed
14-05-2020 - 100k req, 25k cancelled, 45 fail dl, audits 872, 3 recoverable audit failed
thats weird i would have figured the more traffic the more auditsā¦
13-05-2020 - 100k req - 20k cancelled - 77 failed dl - 20 rejected - 841 audits
12th 900 audits no fails regular numbers
11th 877 + 2 recoverable fails
10th 1214 + 10 recoverable failed audits
9th 534 + 6 rec fail aud (might be one of the days i crashed hard for a extended periodā¦ had some issues with my server turning itself off
8th 658 + 6rec fail aud (most likely the same issue causing my numbers to be outside the norm)
7th 824 + 0 failed (day of the deletions 79000 deleted) got 36 rejected uploads because ill decided just how fast i deal with stuff thank youā¦
6th
worst one yet
========== AUDIT ==============
Critically failed: 0
Critical Fail Rate: 0.000%
Recoverable failed: 7
Recoverable Fail Rate: 2.154%
Successful: 318
Success Rate: 97.846%
========== DOWNLOAD ===========
Failed: 94
Fail Rate: 1.260%
Canceled: 48
Cancel Rate: 0.644%
Successful: 7317
Success Rate: 98.096%
========== UPLOAD =============
Rejected: 34
Acceptance Rate: 99.964%
---------- accepted -----------
Failed: 2
Fail Rate: 0.002%
Canceled: 15070
Cancel Rate: 16.144%
Successful: 78278
Success Rate: 83.854%
========== REPAIR DOWNLOAD ====
Failed: 0
Fail Rate: 0.000%
Canceled: 0
Cancel Rate: 0.000%
Successful: 17
Success Rate: 100.000%
========== REPAIR UPLOAD ======
Failed: 0
Fail Rate: 0.000%
Canceled: 116
Cancel Rate: 13.892%
Successful: 719
Success Rate: 86.108%
========== DELETE =============
Failed: 0
Fail Rate: 0.000%
Successful: 107716
Success Rate: 100.000%
logs go a bit further back like thisā¦ then they are bigger bits, i kinda like being able to do thisā¦
seems very much like audits atleast in my caseā¦ always get throughā¦ avg comes out to maybe 99.5% successrate on audits, and most recoverableā¦ which might be something one could useā¦
if one fails an audit, the from what the successrate.sh suggests, then it retries laterā¦ it would make sense that the uptime might make use of thatā¦
so that if you drop an audit it just tries again shortly after or however it worksā¦ should be part of the audit recovery kinda thingā¦ because the whole point was to improve performance and not add additional workloads to the network, thus if there is a recovery of failed audits, making that as a recoverable uptime check should be ātrivialā atleast in theoryā¦
from what i can see i donāt think there is a problem with using auditsā¦ they do vary quite a bitā¦ but mine are anywhere from 500 to 2000k and avg of pretty much 1k a dayā¦Ā±200