How safe is deduplication/content defined chunking?

As far as I understand cdc may save a lot of space but if one chunk gets currupted all backups of files that contain this chunk are currupted right?

so worst case would be a filetype with a “static” header of size >= chunksize
so the header-part would be deduplicated with cdc and if this single chunk gets currupted all files of this type in each Backup are currupted.

Or did I missunderstood something?

It is a property of deduplication that identical chunks are saved only once. And yes, if a chunk is corrupted, all files containing this chunk are also corrupted (cannot be restored to their original content)

But this is not because of content defined chunking. The same would happen for other chunking algorithms like using chunks of fixed length, etc.

is there anything we can do to avoid curruption (besides multiple copies of the repository)? :thinking:
i run once a week check --read-data but if it shows errors it’s maybe too late.

assuming i still have the original file, is there something to repair currupted chunks?

Well, data corruption is usually very rare. If you still have the original file, then the steps at Recover from broken pack file · Issue #828 · restic/restic · GitHub will repair the repository. Other than that it’s usually a good idea to have two separate backups (the 3-2-1 rule for backups: 3 copies of the data - 2 different media/tools - 1 offsite). That way if one is corrupted, then the other one is still intact.

hmm - sound not that easy.

are there plans for some repair commands?

e.g. search currpted chunks with corresponding paths and check if the file still exists and repair the chunk if possible (I think it could be checked based on filename and chunk-content)

See Add repair command by aawsome · Pull Request #2876 · restic/restic · GitHub

1 Like

Ideally you have to add parity bits, using something like PAR2. Some software bake that in.

1 Like

hi,

that would be great. for many years, been using winrar which have can add recovery records.