1411 Repair Download - Good or Bad?

I can explain that.

In order to make profit we need to keep the repair traffic low because repair traffic is more expensive than the storage expension factor. We started with 29, 35, 80, 95-130. 29 pieces are needed to reconstruct the file, 80 pieces exist, additional 15-35 pieces for long tail cancelation and the magic number 35. If we ever hit 35 pieces we have to repair the file before it drops below 29 pieces.

The network was running with these numbers for a long time and we didn’t lost a single file. Our forcast was showing that the first repair wave would kick in when we want to start selling our product. If we hit some bugs we might loose files! What options do we have to mitigate that risk?

Step 1: With Changelog v0.22.1 we added a new config for the satellite that allows us to overwrite the repair threshold. We set it to 52 on all satellites to test repair in production. In turns out we had a few issues with the repair job. Meanwhile they are all fixed and thanks to that test we still haven’t lost a file. The repair job is looking very good now.

Step 2: A new issues showed up. The repair traffic is lower than we estimated. That is a very good news on the one side but also a new risk. Our prediction is not as accurate as we would like to have it. Again we might lose files because of something unknown we didn’t take into account. So lets run some additional test by increasing the repair threshold to 64 (active since yesterday) and watch the repair traffic again. At the moment we are doing that only on one satellite.

11 Likes