Backing up database - large snapshot size

I have a problem with backing up postgresql dumps, it’s simple
su - postgres -c pg_dumpall | restic --stdin
via sftp, stored on ext4 partition.

Single dump size reported by restic is 8800-8900MB, stored is about 4000MB. Gzipped dump is about 1700MB. I’ve expected that whith deduplication I’ll get ~100 MB snapshots because those data are not changing frequently. Only idea I have now is to add gzip in pipe and store gzipped data in restic, but maybe there is much something much wiser?

If you go the gzip-route, make sure to use the --rsyncable parameter for gzip. It really helps the deduplication algorithm.

1 Like

While @lkosz is using pg_dumpall which dumps the entire pg cluster as sql utf8, I’m attempting to backup using a pg_dump binary format. The proprietary ‘custom’ binary gives more flexibility with restores, but by default applies compression internally. Am I to understand correctly that this will likely thwart the deduplication algorithm in restic, and undermine deduplication benefits of Restic for database backups, that contain substantial duplication?

Is it worth the effort to disable compression in the pg_dump -Fc output, and use Restic 0.14’s new repository format supporting its own compression?

If I were to attempt to measure the deduplication differences between pg_dump compression on and off, how would I do this with the restic tools?

Appreciate any advice. Thanks everyone.

Regards,
Damo.

Yes, compressed data is not helpfull with the chunk based deduplication. If there is gzip involved and you can pass --rsyncable to it, it can help a bit.

I would surely give that a try.

Maybe there are more smarter ways but I would let restic backup the dump alone (and repeat that after data has changed) and watch the verbose output. At the end you will see something like “added to the repository 100MB (10MB stored)” and you know how much has been saved.