Drastic increase in upload failures in February

selea · February 12, 2020, 11:32am

Hi,

I have seen an extreme drop in successrate when it comes to uploads this month (February) compared to previous month (January). This changed over night (31/1 > 1/2).
I did assume that this was only temporary but this trend has been the same for the past two weeks.

Does it have something to do that the first actual clients if US-based so therefore no basically no data is being sent from my node? That does not explain why there is almost 0 outbound traffic from my node this month thou.

========== AUDIT =============
Successful:           1104
Recoverable failed:   0
Unrecoverable failed: 0
Success Rate Min:     100.000%
Success Rate Max:     100.000%
========== DOWNLOAD ==========
Successful:           9142
Failed:               87
Success Rate:         99.057%
========== UPLOAD ============
Successful:           10045
Rejected:             0
Failed:               63270
Acceptance Rate:      100.000%
Success Rate:         13.701%
========== REPAIR DOWNLOAD ===
Successful:           9
Failed:               0
Success Rate:         100.000%
========== REPAIR UPLOAD =====
Successful:           421
Failed:               2412
Success Rate:         14.861%

BrightSilence · February 12, 2020, 11:37am

selea · February 12, 2020, 11:50am

Alright!
I used to have 99.5-100% successrate in uploads.
So if more users from northern Europe would start using Tardigrade - would I see more traffic then?

BrightSilence · February 12, 2020, 11:52am

I would expect so, yes. So do some local promotion at local businesses.

Tulip · February 13, 2020, 9:45pm

Hi,

Same here i Norge.

Fiber connection

My November stats

========== AUDIT =============
Successful: 105
Recoverable failed: 0
Unrecoverable failed: 0
Success Rate Min: 100.000%
Success Rate Max: 100.000%
========== DOWNLOAD ==========
Successful: 12047
Failed: 2
Success Rate: 99.983%
========== UPLOAD ============
Successful: 28406
Rejected: 0
Failed: 394
Acceptance Rate: 100.000%
Success Rate: 98.632%
========== REPAIR DOWNLOAD ===
Successful: 0
Failed: 0
Success Rate: 0.000%
========== REPAIR UPLOAD =====
Successful: 111
Failed: 0
Success Rate: 100.000%

Today stats

========== AUDIT =============
Successful: 9356
Recoverable failed: 0
Unrecoverable failed: 0
Success Rate Min: 100.000%
Success Rate Max: 100.000%
========== DOWNLOAD ==========
Successful: 48588
Failed: 180
Success Rate: 99.631%
========== UPLOAD ============
Successful: 103574
Rejected: 0
Failed: 140534
Acceptance Rate: 100.000%
Success Rate: 42.430%
========== REPAIR DOWNLOAD ===
Successful: 2226
Failed: 0
Success Rate: 100.000%
========== REPAIR UPLOAD =====
Successful: 5413
Failed: 6848
Success Rate: 44.148%

So looks like a big drooooooop

KernelPanick · February 13, 2020, 10:06pm

When there is more traffic and less competition successrates are higher. When there is less traffic, and more competition, they will be much lower.

Mad_Max · February 14, 2020, 5:37am

Note there is a confirmed bug in latest SW version release that cause a LOT of actually successful transfers marked as failed on logs. Here bug itself on the Github: https://github.com/storj/storj/issues/3771

So at least significant part of increased “upload failure ratio” in February is caused by this logging bug. Actual situation is much better then you can think by looking/parsing local logs taken from storage node.
My rough estimates says for >70-80% of upload failures currently reported in logs are not actual failures during data transfer but just log bugs.

Although i am not sure about downloading (node egress) percentage. As there is no simple way to check it from SNOs side.
For uploads(ingress) its easy - take pieces id and look for corresponding file on local node HDD. If file is present - then there were no any actual errors and it just wrong log line. If there are line in log that say satellite ask to delete such piece - then again - it was not actual error but just log bug again. Piece was uploaded OK and was later deleted by satellite as it no longer needed.

It how i get my estimate of 70-80% rate of “false positive” cases for upload failures.

P.S.
But percentage is relevant to my nodes and other similar nodes only. In Jan i had 97%-99% “Success Rate” on uploads (only 1-3% upload failed) so they almost always in “fast cohort” of nodes as “normal”(netwok avarage) should be about ~15%.
For slower nodes there may be higher percentage of actual errors, but still significant part is just due to wrong logging.