Can't split load on multiple nodes. Why?

littleskunk · June 11, 2024, 4:43pm

The problem was my limited bandwidth. With a single node the satellite has no issue to max out my resources. The bitshift success tracker is doing a great job and keeps my node at a high success rate. If an upload fails the bitshift success tracker will notice that very quickly and scale down the upload rate.

With 2 node on the same IP this gets problematic but it still works. With 4 nodes on the same IP it starts failling to scale the request rate and tries to upload more pieces to my nodes than I have bandwidth available. The result is a much lower success rate. I am not 100% sure why that is happening. It might be bad luck that the upload failures that the bitshift success tracker is waiting for are not evenly distributed making it harder for the success tracker to max out my resources. It could also be that the 100% bandwidth utilization has some side effect on TCP fast open or connection pooling. A lot of variables.

Now to the good news. The moment I upgraded my bandwidth the problem was gone. Now I can split the incoming traffic even on 8 nodes with no issues and they are all having a great time with 99% success rate.

Now this could happen again. Upgrading my bandwidth makes it so that I currently doesn’t hit 100% bandwidth utilization but it is too close to feel comfortable. So I have written an script that just rotates my nodes. It takes the output from df to find out which drive has more free space available. The corresponding node gets the entire drive as allocated space. All the other nodes will get 500GB and therefor not accept any uploads. On the next day I run the script again and it will allow a different node to take all the uploads and reduce the current one to 500GB allocation.

Running this script once per day is great but hopefully one day I will run out of free space and as a consequence I have to run the script once per hour. Now there is one problem with that. The transition from one node to another takes 5 minutes. I would lose 5 minutes of uploads per hour. But there is a solution. Instead of sizing down the node to 500 GB I resize it to still have about 5GB of free space. That way I have a nice transition with no gap. The node will inform all the satellites that they should stop uploading to it but it will take any file that still gets uploaded. While this is ongoing one of the other nodes replaces it in the node selection. In the next 5 minutes the satellite API pods will either select the soon full node or start selecting the now available node with no gap. On the following run the script will size down the node to 500 GB. So in total there are 3 states. One node accepts uploads, one node is in transition with 5GB free space and at the end it gets just 500GB allocated space and waits for the next turn. That way I keep my upload rate high and run the script as frequently as needed.

I also thought about mixing in a used space calculation on startup from time to time. The node that is going to take the uploads from now on is never allowed to run the used space calculation. I don’t want it to slow down the uploads and also there is no reason for that node to run it. Same for the node that is in transition. I don’t want to slow it down but running the used space calculation the moment it transitions from 5GB free space to 500GB allocated it can run some additional maintenance. Perfect time and it would be just one node per cycle. For now I am not doing it. In a perfect world with no bugs I shouldn’t have to run the used space calculation ever. Not running it is the best way to identify possible bugs in that area. But I am sure there will be other maintenance jobs that I want to run from time to time and with my script I can make sure the maintenance job doesn’t impact the 1-2 nodes that accept uploads and only run on the drives that are more or less on idle.