1411 Repair Download - Good or Bad?

That´s it guys, I have the stats below but the Repair Download is high as I never saw it!
Is this ok?
image

They might be testing repairs. It’s high for me too.

1 Like

Yes, that is fine. Mine was like that too.

I can explain that.

In order to make profit we need to keep the repair traffic low because repair traffic is more expensive than the storage expension factor. We started with 29, 35, 80, 95-130. 29 pieces are needed to reconstruct the file, 80 pieces exist, additional 15-35 pieces for long tail cancelation and the magic number 35. If we ever hit 35 pieces we have to repair the file before it drops below 29 pieces.

The network was running with these numbers for a long time and we didn’t lost a single file. Our forcast was showing that the first repair wave would kick in when we want to start selling our product. If we hit some bugs we might loose files! What options do we have to mitigate that risk?

Step 1: With Changelog v0.22.1 we added a new config for the satellite that allows us to overwrite the repair threshold. We set it to 52 on all satellites to test repair in production. In turns out we had a few issues with the repair job. Meanwhile they are all fixed and thanks to that test we still haven’t lost a file. The repair job is looking very good now.

Step 2: A new issues showed up. The repair traffic is lower than we estimated. That is a very good news on the one side but also a new risk. Our prediction is not as accurate as we would like to have it. Again we might lose files because of something unknown we didn’t take into account. So lets run some additional test by increasing the repair threshold to 64 (active since yesterday) and watch the repair traffic again. At the moment we are doing that only on one satellite.

11 Likes

Perfect explaining, thank you very much for the feedback.

There are worse problems you can have than needing less repair than expected.
This is definitely good to hear and may give you guys some more flexibility in other aspects. Perhaps some more leniency in uptime requirements? :wink:

@naxbc: In any case, repair traffic is not an indication of things being wrong on your node. In fact, I don’t think you would see repair traffic at all if missing data on your node is triggering a repair. Repair traffic happens when availability drops below the thresholds littleskunk mentioned and consists of downloads of remaining good pieces from nodes and then reuploads of repaired pieces to known good nodes. Both of those transfers don’t happen on the nodes that lost the data. So seeing an increase in repair traffic is an indication of rising repair need on the entire network, it’s not an indication about anything for your node specifically.

4 Likes

I’ve noticed there is repair on both Ingress (not paid) and Egress (paid) bandwidth.

Is this correct, repair Ingress is not paid?

It’s partially correct. Like with normal ingress the transfer itself is not paid, but you’re paid for storing that data from that point on. So it’s still good for you in the long run.

2 Likes