Known issues we are working on

On the graceful exit issue I found one more:

The storage node will submit graceful exit success in one message at the end of a batch. Graceful exit failures are not batched. The storage node will submit them one by one. Less powerful systems / routers can get overloaded by the number of connections. This creates a cylce. In the next batch the storage node will fail even more transfers which will increase the impact of this problem until the storage node finally gets disqualified for too many failures.

I am now 99% sure exactly that is the big issue in production. I will put it on the top of list.

3 Likes