Poor Success rate on hashstore

I also observe successrate degradation after migration to hashstore completed.

While my results are not as dramatic as ones reported by OP – there is some regression indeed and it needs to be looked at. If anything – the issue seems to be amplified: the crappier the success rate was to begin with – the more it regresses.

Biggish node in Washington:

        PIECESTORE                              HASHSTORE
========== AUDIT ============== 	========== AUDIT ==============
Critically failed:     0        	Critically failed:     0
Critical Fail Rate:    0.000%   	Critical Fail Rate:    0.000%
Recoverable failed:    0        	Recoverable failed:    0
Recoverable Fail Rate: 0.000%   	Recoverable Fail Rate: 0.000%
Successful:            3238     	Successful:            3748
Success Rate:          100.000% 	Success Rate:          100.000%
========== DOWNLOAD =========== 	========== DOWNLOAD ===========
Failed:                438      	Failed:                710
Fail Rate:             0.287%   	Fail Rate:             0.393%
Canceled:              1714     	Canceled:              1539
Cancel Rate:           1.123%   	Cancel Rate:           0.852%
Successful:            150527   	Successful:            178486
Success Rate:          98.591%  	Success Rate:          98.756%
========== UPLOAD ============= 	========== UPLOAD =============
Rejected:              0        	Rejected:              0
Acceptance Rate:       100.000% 	Acceptance Rate:       100.000%
---------- accepted ----------- 	---------- accepted -----------
Failed:                0        	Failed:                0
Fail Rate:             0.000%   	Fail Rate:             0.000%
Canceled:              33       	Canceled:              1450
Cancel Rate:           0.069%   	Cancel Rate:           2.678%
Successful:            47952    	Successful:            52693
Success Rate:          99.931%  	Success Rate:          97.322%
========== REPAIR DOWNLOAD ==== 	========== REPAIR DOWNLOAD ====
Failed:                0        	Failed:                1
Fail Rate:             0.000%   	Fail Rate:             0.004%
Canceled:              1        	Canceled:              1
Cancel Rate:           0.029%   	Cancel Rate:           0.004%
Successful:            3500     	Successful:            24399
Success Rate:          99.971%  	Success Rate:          99.992%
========== REPAIR UPLOAD ====== 	========== REPAIR UPLOAD ======
Failed:                0        	Failed:                0
Fail Rate:             0.000%   	Fail Rate:             0.000%
Canceled:              16       	Canceled:              432
Cancel Rate:           2.381%   	Cancel Rate:           14.343%
Successful:            656      	Successful:            2580
Success Rate:          97.619%  	Success Rate:          85.657%
========== DELETE ============= 	========== DELETE =============
Failed:                0        	Failed:                0
Fail Rate:             0.000%   	Fail Rate:             0.000%
Successful:            0        	Successful:            0
Success Rate:          0.000%   	Success Rate:          0.000%

Small-ish node in California:

        PIECESTORE                               HASHSTORE
========== AUDIT ==============      ========== AUDIT ==============
Critically failed:     0             Critically failed:     0
Critical Fail Rate:    0.000%        Critical Fail Rate:    0.000%
Recoverable failed:    0             Recoverable failed:    0
Recoverable Fail Rate: 0.000%        Recoverable Fail Rate: 0.000%
Successful:            985           Successful:            1234
Success Rate:          100.000%      Success Rate:          100.000%
========== DOWNLOAD ===========      ========== DOWNLOAD ===========
Failed:                168           Failed:                125
Fail Rate:             0.115%        Fail Rate:             0.077%
Canceled:              680           Canceled:              600
Cancel Rate:           0.464%        Cancel Rate:           0.371%
Successful:            145549        Successful:            160842
Success Rate:          99.421%       Success Rate:          99.551%
========== UPLOAD =============      ========== UPLOAD =============
Rejected:              0             Rejected:              0
Acceptance Rate:       100.000%      Acceptance Rate:       100.000%
---------- accepted -----------      ---------- accepted -----------
Failed:                0             Failed:                7
Fail Rate:             0.000%        Fail Rate:             0.006%
Canceled:              82            Canceled:              1580
Cancel Rate:           0.056%        Cancel Rate:           1.385%
Successful:            145809        Successful:            112525
Success Rate:          99.944%       Success Rate:          98.609%
========== REPAIR DOWNLOAD ====      ========== REPAIR DOWNLOAD ====
Failed:                0             Failed:                0
Fail Rate:             0.000%        Fail Rate:             0.000%
Canceled:              0             Canceled:              0
Cancel Rate:           0.000%        Cancel Rate:           0.000%
Successful:            1643          Successful:            5592
Success Rate:          100.000%      Success Rate:          100.000%
========== REPAIR UPLOAD ======      ========== REPAIR UPLOAD ======
Failed:                0             Failed:                0
Fail Rate:             0.000%        Fail Rate:             0.000%
Canceled:              31            Canceled:              416
Cancel Rate:           2.034%        Cancel Rate:           11.082%
Successful:            1493          Successful:            3338
Success Rate:          97.966%       Success Rate:          88.918%
========== DELETE =============      ========== DELETE =============
Failed:                0             Failed:                0
Fail Rate:             0.000%        Fail Rate:             0.000%
Successful:            0             Successful:            0
Success Rate:          0.000%        Success Rate:          0.000%

It appears uploads success rate took the hit in both cases. I’m not sure why: sync writes are disabled on the dataset. Maybe there is some delay communicating the completion, because hashstore needs to do more maintenacne on each write? Both nodes are on zfs arrays with special device and plenty of ram.

In the meantime I’ll set the storage2migration.suppress-central-migration: true flag on my other nodes for now, until this is triaged and addressed.

Perhaps an interesting workaround would be to have stuff uploaded to piece store, and then migrated to hashstore. But if piecestore is to be retired – the regerssion needs to be fixed: once everyone migrates to hashstore the aparent success rates will recover on their own, as competing nodes will become just as slow – but it’s not good for the network.

3 Likes