Same files different filenames

I want to backup an influxdb database using their backup tool. It produces a bunch of gz files, each covering a period of time in the time series database. This means that when I run the backup tool, I get files with identical content being produced, but with filenames based on the date the backup was run.
If I run a monthly backup by simply coppying to an external drive, I would get about 90% of the files being the same content (with different names) all consuming storage.

I figured Restic might be able to help here. If I clear my local backup folder before I run influx backup again, and then restiuc backup - will it make use of the deduplication to avoid recopying the new files across (even if they have different names) ?

Yes, restic would deduplicate depending on the content. A thing which can happen with changing files/folders is failing to find a parent snapshot, but afaik that’d cause re-scanning of files at worst.

1 Like

I don’t know if it’ll affect your use-case, but there’s another gotcha to watch out for when hoping to deduplicate data between gzip files: gzip uses an adaptive compression algorithm, where even very small changes in the input data can lead to very large changes between compressed files. It might be worth carrying-out a diff between a couple of the output files that you believe should be the same just to verify they really are the same (and therefore appropriate for deduping by restic).

Gzip comes with the --rsyncable flag which alters the behaviour of the compression algorithm to make it more suitable for deduping (or fast rsyncing - hence the flag name). When this option is used, small changes in input data should only lead to localized changes in the resulting gz file, so the rest can still be successfully deduplicated.

I see a question about whether --rsyncable can be implemented in influxdb backups here, but it has no responses. If you’re suffering from the above, you can always decompress and recompress the gz files with the --rsyncable flag before carrying-out each restic backup.

If you want to know more about this subject, here’s a good overview.

1 Like

Thank you both.

Good suggestion to diff the files. I’ll do this manually over the next few weeks to see if there is any chance of significant changes to the files.