So let’s check:
Node1 has recovered more but on 2 satellites it went down again:
docker logs storagenode1 2>&1 | grep -E 'GET_AUDIT' | grep 'failed' | wc -l
0
docker logs storagenode1 2>&1 | grep -E 'GET_REPAIR' | grep 'failed' | wc -l
7
Errors were:
6x write: broken pipe
and 1x use of closed network connection
Node2 seems to be slowly recovering on all but one satellite:
docker logs storagenode2 2>&1 | grep -E 'GET_AUDIT' | grep 'failed' | wc -l
0
docker logs storagenode2 2>&1 | grep -E 'GET_REPAIR' | grep 'failed' | wc -l
0
Node3 which seems to be recovering on some and doing worse on other satellites:
docker logs storagenode3 2>&1 | grep -E 'GET_AUDIT' | grep 'failed' | wc -l
0
docker logs storagenode3 2>&1 | grep -E 'GET_REPAIR' | grep 'failed' | wc -l
2
Errors are: 1x use of closed network connection
and 1x broken pipe
Maybe this? Load tests on Storage nodes?