How safe is deduplication/content defined chunking?

AlBundy · January 23, 2023, 6:14pm

As far as I understand cdc may save a lot of space but if one chunk gets currupted all backups of files that contain this chunk are currupted right?

so worst case would be a filetype with a “static” header of size >= chunksize
so the header-part would be deduplicated with cdc and if this single chunk gets currupted all files of this type in each Backup are currupted.

Or did I missunderstood something?

alexweiss · January 23, 2023, 7:11pm

It is a property of deduplication that identical chunks are saved only once. And yes, if a chunk is corrupted, all files containing this chunk are also corrupted (cannot be restored to their original content)

But this is not because of content defined chunking. The same would happen for other chunking algorithms like using chunks of fixed length, etc.

AlBundy · January 23, 2023, 8:08pm

is there anything we can do to avoid curruption (besides multiple copies of the repository)?
i run once a week check --read-data but if it shows errors it’s maybe too late.

assuming i still have the original file, is there something to repair currupted chunks?

MichaelEischer · January 23, 2023, 9:03pm

Well, data corruption is usually very rare. If you still have the original file, then the steps at Recover from broken pack file · Issue #828 · restic/restic · GitHub will repair the repository. Other than that it’s usually a good idea to have two separate backups (the 3-2-1 rule for backups: 3 copies of the data - 2 different media/tools - 1 offsite). That way if one is corrupted, then the other one is still intact.

AlBundy · January 23, 2023, 9:55pm

hmm - sound not that easy.

are there plans for some repair commands?

e.g. search currpted chunks with corresponding paths and check if the file still exists and repair the chunk if possible (I think it could be checked based on filename and chunk-content)

gurkan · January 25, 2023, 2:21pm

See Add repair command by aawsome · Pull Request #2876 · restic/restic · GitHub

Eli6 · January 25, 2023, 10:10pm

Ideally you have to add parity bits, using something like PAR2. Some software bake that in.

asdffdsa · January 27, 2023, 4:11pm

hi,

that would be great. for many years, been using winrar which have can add recovery records.