Bye bye for now

hashbackup · September 24, 2021, 4:08am

I’ve been running daily backups on my dev server since 2012 and it is setup to retain the last 30 days and then 12 monthly backups. HashBackup does pack user files into large 1GB files. As user files are modified and re-saved, it makes older versions of that file obsolete, so “holes” get punched in the large archive files. When the ratio of holes to data gets too high, usually 50%, these old backup files have to be downloaded, repacked, and uploaded again. The download doesn’t occur if there is a local copy of the backup. And even if there isn’t a local copy, HB only downloads the active parts of the large archive file using ranged gets, so even if a download is needed, it’s usually not for the whole file.

Here’s a distribution by year of the backup archive files on my dev server:

[jim@bs hbbackup]$ uniq -c out
 117 2021
  62 2020
 228 2019
  43 2018
  66 2017
  70 2016
   8 2015
   2 2014
   8 2013
  28 2012

The 28 files (these are large 1GB blobs) that haven’t changed since 2012 are likely OS files like /Applications, /usr, etc. that were saved in the initial backup. The pattern is disrupted in 2020 because the virus made me lazy and I didn’t do much development.

As backup archive files age, they are more aggressively deleted, with the remaining active data packed into new, smaller files. So at least the way HB works, the backup doesn’t accumulate more and more older files. I think the only way you would see that pattern is if all backup versions were kept, ie, your retention policy was to never delete anything from the backup.