Restic copy: How does it work?

Den · December 23, 2020, 1:52pm

Can someone explain to me what API requests happen when copying snapshots from repo A to repo B? Specifically, I’m wondering if any delete/read operations are needed on repo B. Of course, we need to read from repo A in order to copy, but what about repo B?

I ask this because I’m thinking of setting up a system where I use restic copy to copy snapshots to a remote repo stored on Glacier or Coldline (via rclone backend). Both of these backends have early deletion fees, and I think Glacier has low availability–therefore I need to know if any downloads/deletes occur wrt repo B.

Thanks! Also I would like to say thanks to all the restic devs. I’ve used restic for some time now and it has worked perfectly!!! Just looking for new ways to improve my backup system.

MichaelEischer · December 23, 2020, 8:09pm

copy will only add data to a repository but won’t delete anything. It basically works as follows: restic loads the repository index for both the source and target repository. The index is usually stored in the local cache, so as long as the index for the target repository is already cached there’s no need to download it from that repository. Afterwards restic loads all snapshots from both repositories and determines which snapshots are missing in the target repository (snapshots are also cached locally). For these missing snapshots it will read the directory metadata and copy that along with missing file chunks to the target repository.

To access all snapshots and indexes restic also first has to request a list of snapshots and indexes.

How do you intend to restore the repository stored on Glacier? There’s currently no support to easily determine which files would be necessary to restore a certain snapshot. Although there is some experimental PR to help with cold-storage usage.

Den · December 24, 2020, 3:39pm

Thanks for the reply. This is very informative!

My original intention was to use Glacier as a final disaster scenario. If all my other backups fail, I will pay the price to download all my data to S3 and restore from there. However, it seems my solution does not work because running restic prune and restic check will be difficult. Perhaps it’s fine for me to never prune or check as this is just a final disaster scenario. Or as an alternative I can run check on the local repo A before running copy to repo B–but I don’t think this is the same thing as there can be corruption in repo B that is not caught.

Can you share the links to the PRs that will help with cold-storage? Also, if you have any other suggestions for another backup location, I would love to hear it. I currently use B2 but looking for another cheap vendor to have another backup. Thanks.

cdhowie · December 24, 2020, 3:52pm

Consider using Backblaze B2 instead. Compared to the S3 Glacier tier, B2 only costs $0.001 more per GB-month, but costs only $0.01 per GB egress (compared to AWS standard $0.09 per GB egress, plus Glacier restore fees).

S3 Glacier Deep Archive is $0.00099 per GB-month, which is about 1/5 the cost of B2 for storage, but egress fees are still high.

I put together an analysis of the cost here, and it depends how often you need to restore which will be cheaper – but the huge caveat to S3 Glacier/GDA tiers (as you point out) is that you can’t do regular repository maintenance operations.

alexweiss · December 24, 2020, 10:42pm

If you have the needed repo files in the local cache, then restic check --with-cache should fully work. It however only lists the files but does not read any data from the backend, so you must trust that the data stored in your backend equals the cached data. Also check --read-data* does not work.
For pruning, you have a similar situation. If you have the needed repo files in the local cache, you can use the prune from the latest beta and try the options --max-unused unlimited or --repack-cacheable-only. Both will perform a prune without the need to read any data from the backend.

Note that besides the cached files, also the config file, key files and lock files are accessed and as they are not cached, they are read from the backend. If you cannot separate them out of your cold storage (like you can do with AWS S3 by restricting Glacier to the /data dir), you need a way to cache also those files. I’m using

github.com/restic/restic

Add flag --cache-all

restic:master ← aawsome:cache-all

opened 06:35AM - 15 Dec 19 UTC

aawsome

+108 -48

What is the purpose of this change? What does it change? ----------------------…---------------------------------- Adds the flag --cache-all When set, all files (including key, config, lock) are cached. To do so, when the flag is set, the cache directory does not use repo ID but the repo string given by -r. Note: In order to correctly cache the config file, the PR #2505 is also required! Was the change discussed in an issue or in the forum before? ------------------------------------------------------------ See issue #2504. Checklist --------- - [x] I have read the [Contribution Guidelines](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#providing-patches) - [ ] I have added tests for all changes in this PR - [ ] I have added documentation for the changes (in the manual) - [ ] There's a new file in `changelog/unreleased/` that describes the changes for our users (template [here](https://github.com/restic/restic/blob/master/changelog/TEMPLATE)) - [x] I have run `gofmt` on the code in all commits - [x] All commit messages are formatted in the same style as [the other commits in the repo](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#git-commits) - [ ] I'm done, this Pull Request is ready for review

but this is honestly a dirty hack and there should be a native restic solution for that.

So at the actual state, I would recommend you just use a rclone to sync your complete repo somewhere for the disaster case. And in the disaster case, just sync it back somewhere where restic can work.

I am using OVH Cloud Archive. GCP Coldline also reads promising, but I don’t have any experiences about it.

If you want to prune and repack packs in backend (and thus need to “warm up” specific files under /data/) you can have a look at

github.com/restic/restic

prune/rebuild-index: Add warmup possibilities

restic:master ← aawsome:prune-warmup

opened 01:05PM - 07 Aug 20 UTC

aawsome

+105 -6

What does this PR change? What problem does it solve? -------------------------…---------------------------- Adds the possibility to access all packs which need a repack during `prune` and `rebuild.index` This allows some cold storages to warm up these packs such that they are all accessable. This warm-up can also be done within a dry run such that the needed packs are available for the next `prune` run. Also the `--json` option is now implemented for prune dry-run. This allows to build custom warm-up processes if the pure access to files doesn't do the warmup. This can be easily extended to `restore`, see #2796. Was the change discussed in an issue or in the forum before? ------------------------------------------------------------ The idea comes from #2796. For cold storage discussions see also #2504 I didn't see a discussion about pruning cold storage, but this PR proved to be very useful for pruning my OVH Cloud Archive repositories. Checklist --------- - [x] I have read the [Contribution Guidelines](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#providing-patches) - [x] I have enabled [maintainer edits for this PR](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/allowing-changes-to-a-pull-request-branch-created-from-a-fork) - [ ] I have added tests for all changes in this PR - [x] I have added documentation for the changes (in the manual) - [x] There's a new file in `changelog/unreleased/` that describes the changes for our users (template [here](https://github.com/restic/restic/blob/master/changelog/TEMPLATE)) - [x] I have run `gofmt` on the code in all commits - [x] All commit messages are formatted in the same style as [the other commits in the repo](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#git-commits) - [x] I'm done, this Pull Request is ready for review

I’m about to open an issue to discuss all the needed points to support cold storages, but so far I was lacking enough time to prepare that seriously…