Noob question about how restic backups work

Dayman · July 7, 2021, 6:39pm

Hi restic community! I just came across restic while doing some research on different backup technologies and I’ve been reading up on it for a day or two.

I’m trying to backup a rather large sqlite file (>100GB) to cloud storage and being new to database backups in general, my question is this:

If I initialize a snapshot of my sqlite database file with restic, then perform a subsequent backup later on, will the second backup re-write the whole sqlite file, or is restic capable of sub-file synchronization such that only the data blocks of the file that changed are backed up to cloud?

Thanks for any clarification or knowledge you may offer.

cdhowie · July 7, 2021, 9:24pm

Restic uses an algorithm called Content Defined Chunking (CDC) to split a large file into multiple blobs. Each blob is individually backed up and deduplicated. This means that intra-file deduplication is also possible, if a file contains a significant amount of duplicate data itself.

So the answer to your question is “it depends.” Changing a region of the database file has the potential to alter how restic chunks the file from that point on, but it might not.

Note also that it is not safe to back up a SQLite database while it is being written to. Either make sure that no other programs write to the database for the duration of the backup, or use a snapshot mechanism that operates at the filesystem or block device level and back up from that snapshot. For example, btrfs and ZFS support atomic subvolume snapshots, and LVM supports atomic block device snapshots. Using these mechanisms while there is an ongoing write operation to the database will make it appear as though the power to the machine was cut at the moment the snapshot was taken – the write will be interrupted, but if SQLite is using either a write-ahead log (WAL) or rollback journal, it will be able to recover using this log/journal should you need to restore the database. If you do not take a snapshot, there is no guarantee that the backed-up database will be in a usable state; it could be corrupted.

On Windows, you can use restic’s --use-fs-snapshot to instruct restic to create a shadow copy (snapshot) of the volume containing the database, and restic will then read from that shadow copy. (This may require administrator privileges.)

You can side-step the snapshot issue entirely if you instead dump the database to SQL and back that up. SQLite’s own locking mechanism will ensure consistency, and you can combine this with gzip --rsyncable to produce a compressed and CDC-friendly stream for restic to back up:

sqlite3 database-file.sqlite3 .dump | gzip --rsyncable | restic --stdin ...

Dayman · July 7, 2021, 9:28pm

This was really helpful, thanks @cdhowie

cdhowie · July 7, 2021, 9:29pm

Happy to help! Make sure you take a look at my recent edits as well.