Copy - 2 Different Source Repos (Identical Data) and 1 Destination

mike9191783 · July 16, 2022, 10:37am

Hey friends,

I’m happy to report successfully sending a copy of my local restic repo into the cloud (S3 bucket) using the restic copy command. But now I’m worrying about the potential for a special case of de-duplication failure.

Here’s the thing: I have two local backup drives here, that I regularly rotate. Both backup drives do the same job: Backup all my machines. So technically, they’d contain the same content (albeit with one a little behind the other, depending on which one is rotated at the moment), but are two different repos; Each drive has a unique restic repo with a unique password.

Both the drives, and the cloud destination (for restic copy) have been created with the --copy-chunker-params suggestion, so I’m not worried about de-duplication failure simply between one of the drives and the cloud.

What I’m wondering is what happens when I rotate local backup drives. Being that the next drive technically still has all the same data, I’d hope data wouldn’t be duplicated up in the cloud. But would it? Will the cloud repo end up with two full copies of my data (one for each drive), or will restic’s copy operation be able to see existing files from the other drive, and simply agree they don’t need to be uploaded again.

Obviously one simple solution is for me to simply choose one drive, and only use restic copy with that drive. But I feel this would kinda defeat the purpose of having two local drives to rotate. I’d have to rotate a drive, finish my backup, then immediately run back out and grab the first drive again, do a second backup, then send to the cloud. I’m sure within weeks the situation would degenerate and I’d only ever use the one drive.

Any ideas, suggestions, tests you’ve all done?

MichaelEischer · July 16, 2022, 5:32pm

If all repositories are created using the same chunker parameters, then copy will be able to deduplicate any data shared between the repositories. Generally speaking restic internally cuts files into smaller parts and then checks whether these already exist in a repository. The file paths or source repositories are not relevant for that process.

Depending on whether the two local disks contain the same snapshots, in the worst case you end up with duplicate snapshots in the cloud repository (but no duplicate data), although copy should be able to detect most duplicates. If both disks contain roughly the same data but different snapshots, then the cloud should end up with both sets of snapshots with properly deduplicated data.

mike9191783 · July 18, 2022, 7:16am

That’s awesome. Thank you.