Secure Backup Strategy - Requesting Comments

cdhowie · January 2, 2019, 9:09am

If you’re keeping a record off-host of which snapshots you expect to see, and regularly comparing them, then yes, this situation could also be detected.

Often enough that corrupt data would not have been purged from S3 by lifecycle management rules.

It doesn’t know how to talk to Glacier directly, and that wouldn’t work well anyway since you have to wait 1-5 minutes for retrieval jobs, and that’s at the most expensive retrieval tier. You would be better off storing in S3 and using lifecycle management rules to transition only files under the data/ prefix to Glacier after some number of days. I believe for this to work, you would have to disable the use of parent snapshots for backups, since restic needs to read the tree objects under data/ to see what might have changed.

However, since Glacier has a 90-day minimum for stored items, pruning at all has the potential to incur extra charges.

Basically, a very bare minimum of functionality would work in real-time and the rest would require expensive retrievals and/or early deletions.

(B2 is only $0.001/GB-month more expensive for storage than Glacier, anyway… and egress is significantly cheaper.)

Aha, so you’re running restic backup twice then, and the repositories have totally different master keys and snapshots IDs? That would definitely be sufficient to keep corruption from being synced, since no syncing is happening.

I’d change the order of these operations:

Revert corrupted objects.
- Note that all you have to do to detect this is fetch each file in the repository and compare its SHA256 sum to its filename. If the sum doesn’t match, look for a prior version to restore and run the same test on that version. If the sum matches then the file is intact and any prior versions would be redundant anyway.
- This is safe to do on a repository that is being written to, because S3 uploads are all-or-nothing; no partially-completed uploads will be visible.
restic forget --prune
restic check --read-data
- Swapping these allows check to do less work, since it doesn’t have to verify the integrity of data we’re going to discard anyway.
Delete all prior versions of all objects.
- This may not be safe to do on a repository during a backup. If an attacker manages to corrupt a pack that was uploaded after step 1 was completed but before step 4 begins, you would delete the good version. The window for such an attack is not large, however, as steps 2 and 3 require an exclusive lock so no backups would be taking until step 3 completes.
- To be honest, I would recommend skipping this step unless it will save you a substantial amount of money. Letting the lifecycle rules run their course is safer. A bug in your “remove all prior versions” script could easily destroy the entire repository.