Updates on Test Data

Our current RS setting is 29/35/65/110 without the repair override. 65/29=2.25 expansion factor. 110/29=3.8 for bandwidth. It is up to 3.8 depending on how much traffic long tail can cancel out.

The first alternative we tested was 16/23/35/60. Less IOPs for the storage nodes but expansion factor and bandwidth is about the same. It still was 20% faster. That was the first surprise. I expected same speed.

The second RS we tested was 16/23/35/50. It was the original target but I didn’t want to change IOPs and bandwidth consumption at the same time. That is why we did the additional step with the previous RS setting. Shorter long tail cancelation decreased the performance. Second suprise. I thought this RS setting consumes less IOPs and less bandwidth so it should be faster.

And the final RS setting I wanted to test was 32/38/54/77. It has almost the same IOPs as our current 29 RS setting. It has a lower expansion factor of 1.7 and would help to upload an even bigger amount of data. It consumes less bandwidth. So overall the best RS number I can come up with. Just that the previous tests showed that a longer long tail improves performance. So lets scale it up to match same bandwidth consumptions as the other RS settings. 32/38/54/120 it is. Suprise it was twice as fast.

Now the problem is that the RS setting with the biggest performance gain isn’t something I really want to use. My plan is to revisit the RS numbers after we tested the new node selection. My theory is that the new node selection will pick better nodes by default to the point that I can pick the shortest possible long tail + some safety margin. Maybe even lower than 32/38/54/77. That would be the best RS setting as long as we can hit our performance goal.

Test with the new node selection failed. I believe I spotted the mistake already and now I am waiting for my coworker to wake up for a new try.

6 Likes