Forget and prune on two synced repositories

theBoatman · May 12, 2020, 2:57pm

Hello!

I am currently evaluating whether restic is useable for my personal backups. @cdhowie was already so nice to explain me in Initial B2 backup gone horribly wrong some basics. One question is still open for me:

If I have two repositories (or to say it better: one repository synced with unison between two locations), can I call forget and prune on both instances resulting in exactly the same changes on both sides?

I would asume this should be ok as long as I make sure that

both instances are perfectly in sync before I start.
I execute forget and prune in the same order on both instances.
no other operations are made on the repositories during the operations.

After that I would expect that a following sync reports that both repositories where changed in the same way. Is that right? The goal is to keep the traffic between the two instances as tiny as possible.

Thanks in advance!

dionorgua · May 12, 2020, 3:13pm

No. Generally you should not assume that doing same command on two exactly same repositories (on file level) will produce same result.

Restic encrypts every file stored in repo. So random `IV’ (initialization vector) is used to make sure that two attempts to encrypt same source blob produces different encrypted data.

theBoatman · May 13, 2020, 9:16am

Ah, of course - the encryption. I forgot about that aspect. That makes totally sense that the result is NOT the same.

So my procedure will probably be

syncing the repositories
Doing forget/prune on one repository
syncing again and hoping that the changes are not too much for my connection

The other option would be to just never prune. Would that have side effects beside probably unused data lying around?

dionorgua · May 13, 2020, 11:14am

Why are you talking about two repositories? Is one just ‘mirror’ of another or there are clients that backups to both of them?

I don’t like idea of two-way ‘merging’ repositories (with both modified) together using tools like unison/rsync others. It’s very fragile and error-prone. It should be probably ok right now (if they were cloned initially, not two different repositories with same password) just because restic never modifies existing files in repository. You’ll probably get some or a lot of duplicate blobs that can be removed using prune.

Currently prune repacks blobs that contains at least one chunk of unneeded data. So expect that whole blob of 3-4MB will be ‘repacked’ due to one unneeded chunk. There are pull requests on github that improves this situation by providing a way to specify threshold value for repack.

theBoatman · May 13, 2020, 2:32pm

The later one is the case. (Or at least it was my plan.)

That was the goal: Reducing the redundancy between two backup targets which have some amount of data in common. And it seemed totally possible in the beginning, but as you can read, I missed the fact that the encryption results in different chunks, and I was not aware that the chunks are combined in different packs, which both makes prune necessary and makes the process much less efficient than I thought.

Even if it is, I would think that as long as I do a restic check with read-data in the end I can be save that everything worked. Is that right?

The background is that I have used unison for the past 10 years, and at best times I had up to 5 machines syncing to the same repository and enjoyed being able to work with the data at any location. Now I have to replace my NAS, I have only two locations anymore and I miss deduplication and encryption, and so I am currently rethinking the process.

dionorgua · May 13, 2020, 2:53pm

Feel free to try. It should work right now (but without any warranty and it may break at any time). I strongly suggest to at least try this in parallel with something else for a while.

Before switching to restic I was using two backup solutions simultaneously for 3-4 monthes…