Can someone explain to me what API requests happen when copying snapshots from repo A to repo B? Specifically, I’m wondering if any delete/read operations are needed on repo B. Of course, we need to read from repo A in order to copy, but what about repo B?
I ask this because I’m thinking of setting up a system where I use restic copy to copy snapshots to a remote repo stored on Glacier or Coldline (via rclone backend). Both of these backends have early deletion fees, and I think Glacier has low availability–therefore I need to know if any downloads/deletes occur wrt repo B.
Thanks! Also I would like to say thanks to all the restic devs. I’ve used restic for some time now and it has worked perfectly!!! Just looking for new ways to improve my backup system.
copy will only add data to a repository but won’t delete anything. It basically works as follows: restic loads the repository index for both the source and target repository. The index is usually stored in the local cache, so as long as the index for the target repository is already cached there’s no need to download it from that repository. Afterwards restic loads all snapshots from both repositories and determines which snapshots are missing in the target repository (snapshots are also cached locally). For these missing snapshots it will read the directory metadata and copy that along with missing file chunks to the target repository.
To access all snapshots and indexes restic also first has to request a list of snapshots and indexes.
How do you intend to restore the repository stored on Glacier? There’s currently no support to easily determine which files would be necessary to restore a certain snapshot. Although there is some experimental PR to help with cold-storage usage.
My original intention was to use Glacier as a final disaster scenario. If all my other backups fail, I will pay the price to download all my data to S3 and restore from there. However, it seems my solution does not work because running restic prune and restic check will be difficult. Perhaps it’s fine for me to never prune or check as this is just a final disaster scenario. Or as an alternative I can run check on the local repo A before running copy to repo B–but I don’t think this is the same thing as there can be corruption in repo B that is not caught.
Can you share the links to the PRs that will help with cold-storage? Also, if you have any other suggestions for another backup location, I would love to hear it. I currently use B2 but looking for another cheap vendor to have another backup. Thanks.
Consider using Backblaze B2 instead. Compared to the S3 Glacier tier, B2 only costs $0.001 more per GB-month, but costs only $0.01 per GB egress (compared to AWS standard $0.09 per GB egress, plus Glacier restore fees).
S3 Glacier Deep Archive is $0.00099 per GB-month, which is about 1/5 the cost of B2 for storage, but egress fees are still high.
I put together an analysis of the cost here, and it depends how often you need to restore which will be cheaper – but the huge caveat to S3 Glacier/GDA tiers (as you point out) is that you can’t do regular repository maintenance operations.
If you have the needed repo files in the local cache, then restic check --with-cache should fully work. It however only lists the files but does not read any data from the backend, so you must trust that the data stored in your backend equals the cached data. Also check --read-data* does not work.
For pruning, you have a similar situation. If you have the needed repo files in the local cache, you can use the prune from the latest beta and try the options --max-unused unlimited or --repack-cacheable-only. Both will perform a prune without the need to read any data from the backend.
Note that besides the cached files, also the config file, key files and lock files are accessed and as they are not cached, they are read from the backend. If you cannot separate them out of your cold storage (like you can do with AWS S3 by restricting Glacier to the /data dir), you need a way to cache also those files. I’m using
but this is honestly a dirty hack and there should be a native restic solution for that.
So at the actual state, I would recommend you just use a rclone to sync your complete repo somewhere for the disaster case. And in the disaster case, just sync it back somewhere where restic can work.
I am using OVH Cloud Archive. GCP Coldline also reads promising, but I don’t have any experiences about it.
If you want to prune and repack packs in backend (and thus need to “warm up” specific files under /data/) you can have a look at
I’m about to open an issue to discuss all the needed points to support cold storages, but so far I was lacking enough time to prepare that seriously…