In the real world I’m using restic to backup to a local NAS, and then I rclone that to the cloud(s) that are in favour.
I have small pack sizes with restic which isn’t going to do me any favours, but it was the right decision when it was made. I have way more data, and restic itself has removed bottlenecks between then and now. Oh and I have a lot more RAM on the various machines running restic now, so I’m open to experimenting for the future.
I also have a machine that generates tons of small files with a high turnover rate, in the past this meant every prune was rewriting a huge portion of my repository and it was a struggle to get them uploaded. I am considering splitting it to a separate repository and just not pruning it.
Upload bandwidth was a factor back then too, I was quite limited for a while in the past due to an ISP misrepresenting their service availability and we were committed before we found out. Ugh. I’ve got 125Mb/s upstream now and symmetric 1Gb/s coming in a matter of weeks so aside from it being inefficient I don’t think I need to care about rebuilding and re-uploading packs at this point.
What would you recommend for restic pack sizes as it applies to Storj? Any matching rclone settings specific to that decision? I’ll run through the hotrodding article again too.
My gut says I should stay just under the 64MB segment size to balance restic’s recommendations and avoid having files get split into multiple segments, then tune the --transfers
knob? I’m also thinking --order-by size, mixed
to balance out the respective bottlenecks of different file sizes? This helps with mixed sets on B2, anyway.
I’ll build a new set of test files to match the recommendation and re-test B2, Storj, possibly Wasabi, and Cloudflare’s R2.
While I’m thinking about it, is there a minimum amount of data I should be using for testing? I don’t want the longtail at the end to have a disproportionate impact on the average, nor rclone’s scanning of files in the start. I’m usually comfortable with about 5 minutes of transfer (whatever volume of data that ends up requiring).
I would like to stay with rclone because my scripts are cloud-agnostic and adding or dropping a cloud is just a line in a config file. In a disaster recovery situation using another tool to download in bulk would be fine, so I’ll compare uplink, but I don’t want to outright replace rclone for daily use.
Let’s assume a clean slate, if this works I’ll take the opportunity to migrate the old snapshots to a new repository to split out the “special” set of small files, get preferred pack sizes and compress all in one shot, then use it to stress-test the 1Gb/s line.