I also observe successrate degradation after migration to hashstore completed.
While my results are not as dramatic as ones reported by OP – there is some regression indeed and it needs to be looked at. If anything – the issue seems to be amplified: the crappier the success rate was to begin with – the more it regresses.
Biggish node in Washington:
PIECESTORE HASHSTORE
========== AUDIT ============== ========== AUDIT ==============
Critically failed: 0 Critically failed: 0
Critical Fail Rate: 0.000% Critical Fail Rate: 0.000%
Recoverable failed: 0 Recoverable failed: 0
Recoverable Fail Rate: 0.000% Recoverable Fail Rate: 0.000%
Successful: 3238 Successful: 3748
Success Rate: 100.000% Success Rate: 100.000%
========== DOWNLOAD =========== ========== DOWNLOAD ===========
Failed: 438 Failed: 710
Fail Rate: 0.287% Fail Rate: 0.393%
Canceled: 1714 Canceled: 1539
Cancel Rate: 1.123% Cancel Rate: 0.852%
Successful: 150527 Successful: 178486
Success Rate: 98.591% Success Rate: 98.756%
========== UPLOAD ============= ========== UPLOAD =============
Rejected: 0 Rejected: 0
Acceptance Rate: 100.000% Acceptance Rate: 100.000%
---------- accepted ----------- ---------- accepted -----------
Failed: 0 Failed: 0
Fail Rate: 0.000% Fail Rate: 0.000%
Canceled: 33 Canceled: 1450
Cancel Rate: 0.069% Cancel Rate: 2.678%
Successful: 47952 Successful: 52693
Success Rate: 99.931% Success Rate: 97.322%
========== REPAIR DOWNLOAD ==== ========== REPAIR DOWNLOAD ====
Failed: 0 Failed: 1
Fail Rate: 0.000% Fail Rate: 0.004%
Canceled: 1 Canceled: 1
Cancel Rate: 0.029% Cancel Rate: 0.004%
Successful: 3500 Successful: 24399
Success Rate: 99.971% Success Rate: 99.992%
========== REPAIR UPLOAD ====== ========== REPAIR UPLOAD ======
Failed: 0 Failed: 0
Fail Rate: 0.000% Fail Rate: 0.000%
Canceled: 16 Canceled: 432
Cancel Rate: 2.381% Cancel Rate: 14.343%
Successful: 656 Successful: 2580
Success Rate: 97.619% Success Rate: 85.657%
========== DELETE ============= ========== DELETE =============
Failed: 0 Failed: 0
Fail Rate: 0.000% Fail Rate: 0.000%
Successful: 0 Successful: 0
Success Rate: 0.000% Success Rate: 0.000%
Small-ish node in California:
PIECESTORE HASHSTORE
========== AUDIT ============== ========== AUDIT ==============
Critically failed: 0 Critically failed: 0
Critical Fail Rate: 0.000% Critical Fail Rate: 0.000%
Recoverable failed: 0 Recoverable failed: 0
Recoverable Fail Rate: 0.000% Recoverable Fail Rate: 0.000%
Successful: 985 Successful: 1234
Success Rate: 100.000% Success Rate: 100.000%
========== DOWNLOAD =========== ========== DOWNLOAD ===========
Failed: 168 Failed: 125
Fail Rate: 0.115% Fail Rate: 0.077%
Canceled: 680 Canceled: 600
Cancel Rate: 0.464% Cancel Rate: 0.371%
Successful: 145549 Successful: 160842
Success Rate: 99.421% Success Rate: 99.551%
========== UPLOAD ============= ========== UPLOAD =============
Rejected: 0 Rejected: 0
Acceptance Rate: 100.000% Acceptance Rate: 100.000%
---------- accepted ----------- ---------- accepted -----------
Failed: 0 Failed: 7
Fail Rate: 0.000% Fail Rate: 0.006%
Canceled: 82 Canceled: 1580
Cancel Rate: 0.056% Cancel Rate: 1.385%
Successful: 145809 Successful: 112525
Success Rate: 99.944% Success Rate: 98.609%
========== REPAIR DOWNLOAD ==== ========== REPAIR DOWNLOAD ====
Failed: 0 Failed: 0
Fail Rate: 0.000% Fail Rate: 0.000%
Canceled: 0 Canceled: 0
Cancel Rate: 0.000% Cancel Rate: 0.000%
Successful: 1643 Successful: 5592
Success Rate: 100.000% Success Rate: 100.000%
========== REPAIR UPLOAD ====== ========== REPAIR UPLOAD ======
Failed: 0 Failed: 0
Fail Rate: 0.000% Fail Rate: 0.000%
Canceled: 31 Canceled: 416
Cancel Rate: 2.034% Cancel Rate: 11.082%
Successful: 1493 Successful: 3338
Success Rate: 97.966% Success Rate: 88.918%
========== DELETE ============= ========== DELETE =============
Failed: 0 Failed: 0
Fail Rate: 0.000% Fail Rate: 0.000%
Successful: 0 Successful: 0
Success Rate: 0.000% Success Rate: 0.000%
It appears uploads success rate took the hit in both cases. I’m not sure why: sync writes are disabled on the dataset. Maybe there is some delay communicating the completion, because hashstore needs to do more maintenacne on each write? Both nodes are on zfs arrays with special device and plenty of ram.
In the meantime I’ll set the storage2migration.suppress-central-migration: true
flag on my other nodes for now, until this is triaged and addressed.
Perhaps an interesting workaround would be to have stuff uploaded to piece store, and then migrated to hashstore. But if piecestore is to be retired – the regerssion needs to be fixed: once everyone migrates to hashstore the aparent success rates will recover on their own, as competing nodes will become just as slow – but it’s not good for the network.