V137.5 - high load?

littleskunk · September 29, 2025, 11:31am

This has nothing to do with the storage node software, healthy or hashstore migration.

A few months ago we decreased the RS settings from 29/43/65/110 down to 29/46/49/70. The reduced the storage expansion factor from 65/29 to 49/29. The downside is that each file has to get repaired more often. It doesn’t happen right away because old files take months to enter the repair queue and when they do they get migrated to the new RS settings. Slowly over time the repair traffic goes up.

Last week we noticed that the repair queue of US1 has grown to the point that we do need to scale up repair workers. No customer data is at risk. Even the worst segments on the repair queue still have enough pieces to not worry about it. If we don’t scale up the repair workers this repair queue bell curve might move closer to the minimum threshold until eventually a segment gets lost. Thats why we are scaling up repair workers and keep observing the situation.