Does deduplication apply to multiple archives in repositories?

Hello! I’m wondering if deduplication applies to multiple archives in the same repository, or only in one single archive. Thanks!

2 Likes

Deduplication is applied to a repository.

2 Likes

Then I assume it wouldn’t be possible to take a full backup without deduplication to each archive in a repository, right?

1 Like

You can’t turn off deduplication, no.
Unsure why you’d want to?

If you need to duplicate the content you’d want a seperate repository for each set of data (I assume this is what you mean by archive) you wish to backup.

1 Like

I see, thank you. I also looked up on forums and it said restic doesn’t have incremental backups. Isn’t incremental backups same as deduplication?

1 Like

If you do another backup restic will only upload what’s changed. It’s not an “incremental” backup though, if you examine that backup snapshot it will list every file. Every snapshot is a complete backup of that point in time. You can delete the first snapshot you took, and the second snapshot still has everything.

I would spend some time reading these forums and the comprehensive documentation, this has all been explained and written many times before.

2 Likes

I am really confused. If it’s not incremental, how is it only updating what’s changed? What happens when you delete the first snapshot, what does it know about what should it change?

P.S.: I’ve also read the forums but couldn’t get any ideas.

1 Like

As I already said:

Backup 100 files.
That’s snapshot A
Add 10 more files.
Take another backup. That’s snapshot B.
Delete Snapshot A.
Examine snapshot B and see you have 110 files.
Delete 50 files and take another backup: Snapshot C.
Delete Snapshot B.
Snapshot C contains only 50 files.

The repository still has the data for everything, snapshot A B and C. When you prune the repository it will only keep the data within it that will satisfy being able to restore every file in every snapshot, in this case a single snapshot C.

Does that explain it? Don’t think of a single “master” backup that every snapshot has to reference. That’s not how it works. A repository keeps every file it needs to satisfy restoration of every snapshot it contains. There is no master. Each snapshot has every file you backed up in it, “powered by” the repository that ensures it has that data. And it deduplicates so the 100 files in snapshot A don’t have be re-uploaded into Snapshot B, the repository just knows to serve them up when you want the files from B. Deduplication at work.

This is not a VMware snapshot.

2 Likes

Let me see if I got it right: I take snapshot A, then I take snapshot B, and then since there is deduplication snapshot B should only contain changes. Then after you delete snapshot A, it’s data + the changes gets merged at snapshot B, does it work like that? Or am I still missing something?

1 Like

You can think like this but nothing is “merged” in reality. Problem I think is terminology which easily confuses things:)

The way restic and similar software works is by decoupling real data (files or rather their chunks) and snapshots metadata which describe which data is needed for given snapshot.

So one file (or its part) can be included in many snapshots. Deleting snapshot does not delete any data. This will only happen when you run rustic prune and given data piece is not needed any more for any remaining snapshot.

2 Likes

So a restic repository holds all the data, no matter old or new, and snapshots basically only tell restic repository which data it should retrieve by metadata. Is this right?

1 Like

Pretty much you nailed it:)

2 Likes

Restic will also never permanently delete data from a repository unless you explicitly prune the repo. If you never prune, the repo will contain every version of a file no matter what.

2 Likes

If you’ve deleted the snapshot containing the file though, there’s no way to get it back, is there? The repo still contains the data if you haven’t pruned it, but you can’t restore it/get at it? It’s dead wood, “prune” just removes dead wood.
Right?

Edit: I’m wrong, see this post.

2 Likes

Very valid point.

Is theoretically valid statement but has zero practical implications. Yes data is still there, but nobody knows any more what is what and where exactly:)

2 Likes

Well, the recover command can reconstruct a snapshot that contains all otherwise unreferenced tree blobs. If prune was never run, then that will allow access to the contents of all deleted snapshot although the snapshot metadata is lost.

3 Likes

Thank you for pointing it out. Something I was not aware of before.

2 Likes

I should have stated the recover command. Thats really what I meant. :sweat_smile:

1 Like