Node goes down/restarts every 10-15 minutes. thread allocation error in logs

Toyoo · May 1, 2024, 7:17pm

While in general I agree with the statements you make, please allow me to add some commentary for others who do not follow the forum.

Average file across the whole network can be estimated by dividing the average segment size (for the record: currently 7.31 MB) by 29. So the average file now would be 252 kB. This number depends on customer behavior. Also, as this number shows an average across all, old and new, files, it may not necessarily reflect the average uploaded file size in a given period of time.

I was happy for my design draft to be mentioned recently in Storj’s post. For the context of this conversation, it has a potential to reduce the average number of writes to disk to potentially significantly less than 1, coalescing writes required for many files into a small number of operations. As such, it seems technically feasible to handle large traffic even on low-I/O storage. It has to be underlined that it is a complex proposal that will require a lot of engineering time, and as such, not likely in short-term future.

In this case it’s a symptom of a bottleneck, and not healthy take on unused resources. While it might be a good idea to handle very short peaks, it’s not a solution to elevated, but sustained traffic. In the latter case, it eats resources while not solving the actual problem, and hence is useless.

Please note that this is the purpose of the max concurrent uploads switch. It is a quick and dirty solution for now, sure, but it’s a sign that solutions of this type would be acceptable for Storj. I hope that the recent work to expose I/O metrics would lead to a similar mechanism, but based on actual measurements.

The node might instead focus on serving downloads. This is, after all, the core purpose of storage. Have the node grow at the rate it can accept, even if it is “just” 2 MB/s.