As far as I understand cdc may save a lot of space but if one chunk gets currupted all backups of files that contain this chunk are currupted right?
so worst case would be a filetype with a “static” header of size >= chunksize
so the header-part would be deduplicated with cdc and if this single chunk gets currupted all files of this type in each Backup are currupted.
It is a property of deduplication that identical chunks are saved only once. And yes, if a chunk is corrupted, all files containing this chunk are also corrupted (cannot be restored to their original content)
But this is not because of content defined chunking. The same would happen for other chunking algorithms like using chunks of fixed length, etc.
is there anything we can do to avoid curruption (besides multiple copies of the repository)?
i run once a week check --read-data but if it shows errors it’s maybe too late.
assuming i still have the original file, is there something to repair currupted chunks?
Well, data corruption is usually very rare. If you still have the original file, then the steps at Recover from broken pack file · Issue #828 · restic/restic · GitHub will repair the repository. Other than that it’s usually a good idea to have two separate backups (the 3-2-1 rule for backups: 3 copies of the data - 2 different media/tools - 1 offsite). That way if one is corrupted, then the other one is still intact.
e.g. search currpted chunks with corresponding paths and check if the file still exists and repair the chunk if possible (I think it could be checked based on filename and chunk-content)