HashBackup + Storj

I got an email asking if HashBackup supported Storj. I guess it would work using the S3 gateway, but I also put together a “shell destination” that uses the Storj uplink command for anyone interested. To use it:

  1. Download HashBackup from the website

  2. Create a new backup directory:

mbp:~ jim$ hb init -c hb
HashBackup #2552 Copyright 2009-2021 HashBackup, LLC
Backup directory: /Users/jim/hb
Permissions set for owner access only
Created key file /Users/jim/hb/key.conf
Key file set to read-only
Setting include/exclude defaults: /Users/jim/hb/inex.conf

VERY IMPORTANT: your backup is encrypted and can only be accessed with
the encryption key, stored in the file:

    /Users/jim/hb/key.conf

You MUST make copies of this file and store them in secure locations,
separate from your computer and backup data.  If your hard drive fails, 
you will need this key to restore your files.  If you have setup remote
destinations in dest.conf, that file should be copied too.
        
Backup directory initialized
  1. Create a dest.conf file in the backup directory:
mbp:~ jim$ cat - >hb/dest.conf
destname storj
type shell
run python ~jim/sjshell.py --uplink ~jim/uplink --bucket hbtest --dir d --command
Ctrl-d
  1. Create sjshell.py. I normally keep these on the HB website but am in transition to a new site and can’t update it right now. Here’s a link to it:
    http://upgrade.hashbackup.com/shells/sjshell.py
    Make sure the paths of your sjshell.py script and uplink executable are right in step 3 above.

  2. Initialize the destination. This is a one-time thing, only for shell destinations:

mbp:~ jim$ hb dest -c hb setid storj
HashBackup #2552 Copyright 2009-2021 HashBackup, LLC
Using destinations in dest.conf

WARNING: this will "takeover" these destinations, even if another backup is using them: storj
Proceed? yes
  1. Do your first backup:
mbp:~ jim$ hb backup -c hb titleapp.pdf
HashBackup #2552 Copyright 2009-2021 HashBackup, LLC
Backup directory: /Users/jim/hb
Backup start: 2021-09-18 16:09:46
Using destinations in dest.conf
Copied HB program to /Users/jim/hb/hb#2552
This is backup version: 0
Dedup not enabled; use -Dmemsize to enable
/
/Users
/Users/jim
/Users/jim/hb
/Users/jim/hb/inex.conf
/Users/jim/titleapp.pdf
Waiting for destinations: storj
Copied arc.0.0 to storj (245 KB 14s 16 KB/s)
Writing hb.db.0
Copied hb.db.0 to storj (4.8 KB 3s 1.2 KB/s)
Waiting for destinations: storj
Copied dest.db to storj (36 KB 5s 7.3 KB/s)

Time: 1.3s
CPU:  0.0s, 3%
Wait: 23.7s
Mem:  65 MB
Checked: 6 paths, 245533 bytes, 245 KB
Saved: 6 paths, 245533 bytes, 245 KB
Excluded: 0
Dupbytes: 0
Space: +245 KB, 282 KB total
No errors
  1. Run a selftest to check downloading:
mbp:~ jim$ hb selftest -c hb -v4
HashBackup #2552 Copyright 2009-2021 HashBackup, LLC
Backup directory: /Users/jim/hb
Most recent backup version: 0
Using destinations in dest.conf
Dedup loaded, 0% of current size
Checking all versions
Checking database readable
Checked  database readable
Checking database integrity
Checked  database integrity
Checking dedup table
Checked  dedup table
Checking paths I
Checked  paths I
Checking keys
Checked  keys
Checking arcs I
Checked  arcs I
Checking blocks I
Getting arc.0.0 from storj
Checking arc.0.0
Checked  arc.0.0 from storj
Checked  arc.0.0 from (local)
Checked  3 blocks I     
Checking refs I
Checked  2 refs I     
Checking arcs II
Checked  arcs II
Checking files
Checked  6 files
Checking paths II
Checked  paths II
Checking blocks II
Checked  blocks II
No errors
  1. Try a restore. This will use the local backup directory files, not Storj:
mbp:~ jim$ hb get -c hb titleapp.pdf
HashBackup #2552 Copyright 2009-2021 HashBackup, LLC
Backup directory: /Users/jim/hb
Most recent backup version: 0
Restoring most recent version
Using destinations in dest.conf

Restoring titleapp.pdf to /Users/jim
Path already exists with same mtime as backup file: /Users/jim/titleapp.pdf
  Existing file last modified on: 2021-07-02 14:09:19
  Backup file last modified on:   2021-07-02 14:09:19
Warning: existing file will be overwritten!
Restore? yes
/Users/jim/titleapp.pdf
Restored /Users/jim/titleapp.pdf to /Users/jim/titleapp.pdf
No errors
  1. Eliminate the local copy of the backup arc files:
mbp:~ jim$ hb config -c hb cache-size-limit 0
HashBackup #2552 Copyright 2009-2021 HashBackup, LLC
Backup directory: /Users/jim/hb
Current config version: 1

Set cache-size-limit to 0 (was -1) for future backups

Now do a dummy backup to actually remove the local arc files. The titleapp.pdf file isn’t saved this time because it hasn’t changed:

mbp:~ jim$ hb backup -c hb titleapp.pdf
HashBackup #2552 Copyright 2009-2021 HashBackup, LLC
Backup directory: /Users/jim/hb
Backup start: 2021-09-18 16:14:24
Using destinations in dest.conf
Increased cache to 220 MB
This is backup version: 1
Dedup not enabled; use -Dmemsize to enable
/
/Users
/Users/jim
/Users/jim/hb
Writing hb.db.1
Copied hb.db.1 to storj (3.9 KB 0s 4.0 KB/s)
Waiting for destinations: storj
Copied dest.db to storj (36 KB 7s 5.0 KB/s)

Time: 0.9s
CPU:  0.0s, 4%
Wait: 8.4s
Mem:  62 MB
Checked: 6 paths, 245533 bytes, 245 KB
Saved: 4 paths, 0 bytes, 0 bytes
Excluded: 0
No errors

Are there any arc files in local backup directory now?

mbp:~ jim$ ls -l hb
total 42840
-rw-r--r--  1 jim  staff    282097 Sep 18 16:14 cacerts.crt
-rw-r--r--  1 jim  staff       108 Sep 18 16:03 dest.conf
-rw-r--r--  1 jim  staff     36864 Sep 18 16:14 dest.db
-rw-r--r--  1 jim  staff   3146512 Sep 18 16:14 hash.db
-rwxr-xr-x  1 jim  staff  18254992 Sep 10 10:36 hb#2552
-rw-r--r--  1 jim  staff    147456 Sep 18 16:14 hb.db
-rw-r--r--  1 jim  staff         6 Sep 18 16:14 hb.lock
-rw-r--r--  1 jim  staff       522 Sep 18 16:01 inex.conf
-r--------  1 jim  staff       346 Sep 18 16:01 key.conf
  1. Do a restore again. This time it will have to download arc.0.0 from Storj:
mbp:~ jim$ hb get -c hb titleapp.pdf
HashBackup #2552 Copyright 2009-2021 HashBackup, LLC
Backup directory: /Users/jim/hb
Most recent backup version: 1
Restoring most recent version
Using destinations in dest.conf
Increased cache to 220 MB
Using local files        <-- IMPORTANT!

Planning cache...
  Items: 1 - 245 KB                     
  Scan time: 0.0s, 0s
  Plan time: 0.0s, 0s
  Mem: 41 MB

Restoring titleapp.pdf to /Users/jim
Path already exists with same mtime as backup file: /Users/jim/titleapp.pdf
  Existing file last modified on: 2021-07-02 14:09:19
  Backup file last modified on:   2021-07-02 14:09:19
Warning: existing file will be overwritten!
Restore? yes
/Users/jim/titleapp.pdf
Restored /Users/jim/titleapp.pdf to /Users/jim/titleapp.pdf
No errors
  1. Wait, that still didn’t download anything because HashBackup saw that the local file was the same as the backed-up file! This uses timestamps and file sizes to determine local file matches, or with --no-mtime will do a full SHA1 hash verification. This feature enables fast restores over partially existing data. Tell HashBackup not to use any local files and try again (see the Downloaded: stat):
mbp:~ jim$ hb get -c hb titleapp.pdf --no-local
HashBackup #2552 Copyright 2009-2021 HashBackup, LLC
Backup directory: /Users/jim/hb
Most recent backup version: 1
Restoring most recent version
Using destinations in dest.conf
Increased cache to 220 MB

Planning cache...
  Items: 1 - 245 KB                     
  Scan time: 0.0s, 0s
  Plan time: 0.0s, 0s
  Saving plan
  Download size: 245 KB   <-- Now it's downloading!
  Peak cache size: 220 MB
  Disk free space: 208 GB, 83% free
  Mem:  41 MB

Restoring titleapp.pdf to /Users/jim
Path already exists with same mtime as backup file: /Users/jim/titleapp.pdf
  Existing file last modified on: 2021-07-02 14:09:19
  Backup file last modified on:   2021-07-02 14:09:19
Warning: existing file will be overwritten!
Restore? yes
/Users/jim/titleapp.pdf
Restored /Users/jim/titleapp.pdf to /Users/jim/titleapp.pdf
No errors
  1. Default arc file size is 100mb, which is probably too low for Storj since it uses 64MB segments. Here’s how to set it to 1gb:
mbp:~ jim$ hb config -c hb arc-size-limit 1g
HashBackup #2552 Copyright 2009-2021 HashBackup, LLC
Backup directory: /Users/jim/hb
Current config version: 2

Set arc-size-limit to 1g (was 100mb) for future backups

Another possibility would be setting it to something like 60MB so that each arc file is its own Storj segment. Don’t set it to 64MB because HashBackup sometimes goes slightly over the limit and that would create 2 Storj segments, one 64MB segment and a 2nd very small segment, which would hurt performance:

  1. HashBackup can send backups to multiple destinations in one pass, and keeps multiple destinations synchronized automatically. I’m starting over on a Linux server with a 442M tar file, but if you want to use the backup above, just skip this next backup. For this file, HashBackup compresses it from 442M down to 2.4M (it’s a tar backup of 100K small files):
    [root@hbtest ~]# ls -lh bigdir.tar
    -rw-r–r-- 1 root root 442M Jan 20 2020 bigdir.tar
    [root@hbtest ~]# hb backup -c hb bigdir.tar
    HashBackup #2493 Copyright 2009-2021 HashBackup, LLC
    Backup directory: /root/hb
    Backup start: 2021-09-21 02:34:29
    Using destinations in dest.conf
    This is backup version: 0
    Dedup not enabled; use -Dmemsize to enable
    /
    /root
    /root/bigdir.tar
    /root/hb
    /root/hb/inex.conf
    Copied arc.0.0 to storj (2.4 MB 2s 1.0 MB/s)
    Writing hb.db.0
    Copied hb.db.0 to storj (26 KB 2s 12 KB/s)
    Copied dest.db to storj (36 KB 1s 29 KB/s)

Time: 3.5s
CPU: 2.6s, 74%
Wait: 5.9s
Mem: 66 MB
Checked: 5 paths, 463278184 bytes, 463 MB
Saved: 5 paths, 463278184 bytes, 463 MB
Excluded: 0
Dupbytes: 0
Compression: 99%, 190.1:1
Efficiency: 166.92 MB reduced/cpusec
Space: +2.4 MB, 2.4 MB total
No errors

The backup was copied to Storj and is also stored in the local backup directory:

[root@hbtest ~]# ls -l hb
total 23044
-rw-r–r-- 1 root root 2443840 Sep 21 02:34 arc.0.0
-rw-r–r-- 1 root root 282097 Sep 21 02:34 cacerts.crt
-rw-r–r-- 1 root root 98 Sep 21 02:21 dest.conf
-rw-r–r-- 1 root root 36864 Sep 21 02:34 dest.db
-rw-r–r-- 1 root root 3146512 Sep 21 02:34 hash.db
-rwxr-xr-x 1 root root 17483136 Mar 20 2020 hb#2493
-rw-r–r-- 1 root root 172032 Sep 21 02:34 hb.db
-rw-r–r-- 1 root root 6 Sep 21 02:34 hb.lock
-rw-r–r-- 1 root root 104 Sep 18 22:19 inex.conf
-r-------- 1 root root 301 Sep 18 22:19 key.conf

  1. Now add a 2nd destination to dest.conf. You can use any text editor, I’m doing it with cat here. Use your real AWS accesskey and secretkey:

[root@hbtest ~]# cat hb/dest.conf
destname storj
type shell
workers 2
run python ~root/sjshell.py --bucket hbtest --dir d --command

[root@hbtest ~]# cat - >>hb/dest.conf

destname amzs3
type s3
accesskey xxx
secretkey xxx
location us-east-2
bucket hashbackup-us-east-2
dir sjtest
^D

[root@hbtest ~]# cat hb/dest.conf
destname storj
type shell
workers 2
run python ~root/sjshell.py --bucket hbtest --dir d --command

destname amzs3
type s3
accesskey xxx
secretkey xxx
location us-east-2
bucket hashbackup-us-east-2
dir sjtest

  1. Now backup a sparse file, like a thin-provisioned VM. This one says it is 3.3GB, but 1.5GB is “holes” and the rest is duplicate data that’s easily compressed. HashBackup mashes it down to 64K in arc.1.0:

[root@hbtest ~]# ls -lh sparse
-rw-r–r-- 1 root root 3.1G Aug 15 2019 sparse
[root@hbtest ~]# du -k sparse
64012 sparse
[root@hbtest ~]# hb backup -c hb sparse
HashBackup #2493 Copyright 2009-2021 HashBackup, LLC
Backup directory: /root/hb
Backup start: 2021-09-21 02:49:03
Using destinations in dest.conf
This is backup version: 1
Dedup not enabled; use -Dmemsize to enable
/
/root
/root/hb
/root/sparse
Copied arc.0.0 to amzs3 (2.4 MB 1s 2.1 MB/s)
Copied arc.1.0 to amzs3 (64 KB 0s 523 KB/s)
Copied arc.1.0 to storj (64 KB 1s 34 KB/s)
Writing hb.db.1
Copied hb.db.1 to amzs3 (44 KB 0s 512 KB/s)
Copied hb.db.1 to storj (44 KB 1s 34 KB/s)
Copied dest.db to amzs3 (36 KB 0s 300 KB/s)
Copied dest.db to storj (36 KB 1s 20 KB/s)

Time: 11.6s
CPU: 10.4s, 89%
Wait: 5.2s
Mem: 75 MB
Checked: 5 paths, 3313238120 bytes, 3.3 GB
Saved: 4 paths, 1761345536 bytes, 1.7 GB
Excluded: 0
Sparse: 1551892480, 1.5 GB
Dupbytes: 1218445312, 1.2 GB, 69%
Compression: 99%, 29347.9:1
Efficiency: 161.75 MB reduced/cpusec
Space: +60 KB, 2.5 MB total
No errors
A new arc file arc.1.0 was created for this backup. Both it and the previous arc.0.0 file were copied to the new S3 destination, but only the new arc file was copied to Storj because it already had the old one. These destinations are now in sync and either one can be used to for restores.

  1. There are 3 copies of the backup: a local copy in the hb directory, a copy on Storj, and a copy on S3. Let’s mess up the local copy and run a selftest. I’m adding --fix because we know it is broken:

[root@hbtest ~]# ls -l hb
total 2988
-rw-r–r-- 1 root root 2443840 Sep 21 03:01 arc.0.0
-rw-r–r-- 1 root root 64192 Sep 21 03:01 arc.1.0
-rw-r–r-- 1 root root 282097 Sep 21 03:01 cacerts.crt
-rw-r–r-- 1 root root 262 Sep 21 03:01 dest.conf
-rw-r–r-- 1 root root 266 Sep 21 03:01 dest.conf.~1~
-rw-r–r-- 1 root root 36864 Sep 21 03:01 dest.db
-rw-r–r-- 1 root root 212992 Sep 21 03:01 hb.db
-rw-r–r-- 1 root root 6 Sep 21 03:01 hb.lock
-r-------- 1 root root 301 Sep 21 03:01 key.conf

[root@hbtest ~]# cp hb/arc.1.0 hb/arc.0.0 ← Corrupt the backup

[root@hbtest ~]# hb selftest -c hb -v4 --fix
HashBackup #2493 Copyright 2009-2021 HashBackup, LLC
Backup directory: /root/hb
Most recent backup version: 1
Using destinations in dest.conf
Checking all versions
Checking database readable
Checked database readable
Checking database integrity
Checked database integrity
Checking dedup table
Checked dedup table
Checking paths I
Checked paths I
Checking keys
Checked keys
Checking arcs I
Error: arc.0.0 size mismatch on local file: db says 2443840, is 64192
Note: arc.0.0 is correct size on storj, amzs3
1 errors
Checked arcs I
Checking blocks I
Getting arc.0.0 from amzs3, storj
Checking arc.0.0
Error: unable to get block 1 of arc.0.0 from (local): read 0 but expected 112 at 2443728 of arc.0.0 size 64192
Corrected block
Error: unable to get block 2 of arc.0.0 from (local): hash mismatch
Corrected block
Error: unable to get block 3 of arc.0.0 from (local): hash mismatch
Corrected block
… ← there are lots of similar errors listed
Error: unable to get block 442 of arc.0.0 from (local): read 0 but expected 4384 at 2435664 of arc.0.0 size 64192
Corrected block
Error: unable to get block 443 of arc.0.0 from (local): read 0 but expected 3680 at 2440048 of arc.0.0 size 64192
Corrected block
Checked arc.0.0 from amzs3
Checked arc.0.0 from storj
Checked arc.0.0 from (local)
Getting arc.1.0 from amzs3, storj
Copied arc.0.1 to amzs3 (2.4 MB 0s 12 MB/s)
Copied arc.0.1 to storj (2.4 MB 2s 866 KB/s)
Checking arc.1.0
Checked arc.1.0 from amzs3
Checked arc.1.0 from storj
Checked arc.1.0 from (local)
Checked 703 blocks I
Checking refs I
Checked 1283 refs I
Checking arcs II
Checked arcs II
Checking files
Checked 9 files
Checking paths II
Checked paths II
Checking blocks II
Checked blocks II
Writing hb.db.2
Copied hb.db.2 to amzs3 (20 KB 0s 194 KB/s)
Copied hb.db.2 to storj (20 KB 2s 8.5 KB/s)
Copied dest.db to amzs3 (36 KB 0s 297 KB/s)
Copied dest.db to storj (36 KB 2s 13 KB/s)
Removed arc.0.0 from amzs3
Removed arc.0.0 from storj
444 errors
Run hb selftest -v2 --fix if corrections were made

[root@hbtest ~]# hb selftest -c hb -v2 --fix
HashBackup #2493 Copyright 2009-2021 HashBackup, LLC
Backup directory: /root/hb
Most recent backup version: 1
Using destinations in dest.conf
Dedup loaded, 0% of current size
Checking all versions
Checking database readable
Checked database readable
Checking database integrity
Checked database integrity
Checking dedup table
Dedup loaded, 0% of current size
Checked dedup table
Checking paths I
Checked paths I
Checking keys
Checked keys
Checking arcs I
Checked arcs I
Checking blocks I
Checked 703 blocks I
Checking refs I
Checked 1283 refs I
Checking arcs II
Checked arcs II
Checking files
Checked 9 files
Checking paths II
Checked paths II
Checking blocks II
Checked blocks II
Writing hb.db.3
Copied hb.db.3 to amzs3 (2.4 KB 0s 20 KB/s)
Copied hb.db.3 to storj (2.4 KB 0s 3.2 KB/s)
Copied dest.db to amzs3 (36 KB 0s 336 KB/s)
Copied dest.db to storj (36 KB 2s 14 KB/s)
No errors

HashBackup found good blocks in the remote arc files to correct bad blocks in the local arc.0.0 file. It will correct remote files the same way, as long as someone has a good block. If no one has a good block, the block is deleted, then the files using it are marked truncated in the backup. They will be re-saved on the next backup if they are still in the live filesystem.

  1. What happens if you lose your local disk, and the local hb directory? The recover command reconstructs it from the remote backup. First save the key.conf and dest.conf files. That’s normally done right after a new backup is initialized. Then wipe out the hb directory, restore key.conf and dest.conf, and run recover to get everything else. Then run selftest -v4 to download all remote arc data and check that everything is consistent:

[root@hbtest ~]# cp hb/key.conf hb/dest.conf .
[root@hbtest ~]# rm hb/*
[root@hbtest ~]# cp key.conf dest.conf hb
[root@hbtest ~]# ls -l hb
total 8
-rw-r–r-- 1 root root 262 Sep 21 03:20 dest.conf
-r-------- 1 root root 301 Sep 21 03:20 key.conf

[root@hbtest ~]# hb recover -c hb
HashBackup #2493 Copyright 2009-2021 HashBackup, LLC
Backup directory: /root/hb
Using destinations in dest.conf
Destinations you have setup are: storj amzs3
Specify a destination to use for recovering backup files

[root@hbtest ~]# hb recover -c hb storj
HashBackup #2493 Copyright 2009-2021 HashBackup, LLC
Backup directory: /root/hb
Using destinations in dest.conf

Recovering backup files from destination: storj
Files will be copied to: /root/hb

Proceed with recovery? yes

Removed /root/hb/dest.db
Getting dest.db from storj
Getting hb.db from storj
Queueing hb.db files

Waiting for /root/hb/hb.db.3
Loading hb.db.3
Verified hb.db.3 signature

Loading hb.db.2
Verified hb.db.2 signature

Waiting for /root/hb/hb.db.1
Loading hb.db.1
Verified hb.db.1 signature

Verified hb.db signature
Checking db integrity
Removing hb.db.N files
Queueing arc files from storj
Waiting for 2 arc files…

Backup files recovered to: /root/hb
Verify the backup with the selftest command:
$ hb selftest -c hb
If inex.conf was customized, restore it with the hb get command.

[root@hbtest ~]# hb selftest -c hb -v4 ← v4 downloads arc files
HashBackup #2493 Copyright 2009-2021 HashBackup, LLC
Backup directory: /root/hb
Most recent backup version: 1
Using destinations in dest.conf
Checking all versions
Checking database readable
Checked database readable
Checking database integrity
Checked database integrity
Checking dedup table
Checked dedup table
Checking paths I
Checked paths I
Checking keys
Checked keys
Checking arcs I
Checked arcs I
Checking blocks I
Getting arc.0.1 from amzs3, storj
Checking arc.0.1
Checked arc.0.1 from amzs3
Checked arc.0.1 from storj
Checked arc.0.1 from (local)
Getting arc.1.0 from amzs3, storj
Checking arc.1.0
Checked arc.1.0 from amzs3
Checked arc.1.0 from storj
Checked arc.1.0 from (local)
Checked 703 blocks I
Checking refs I
Checked 1283 refs I
Checking arcs II
Checked arcs II
Checking files
Checked 9 files
Checking paths II
Checked paths II
Checking blocks II
Checked blocks II
No errors

Everything’s good to go again.

The default number of threads for HashBackup uploads/downloads is 2. If you want to increase it, add a line “workers 4” to dest.conf to set the number of threads to 4. I ran some tests on a very small VM (512K of RAM) and had lots of “out of memory” problems. HashBackup recovers from these (it does 10 retries by default), but it’s probably best to start with the default of 2 workers, monitor your free memory during the backup, and gradually increase if it makes sense and improves performance.

Have fun!
Jim

12 Likes

Hello @hashbackup ,

I would like to thank you for sharing this instruction!