I’m aware that backing up a system where files are currently in use is risky because there’s no real guarantee for consistency. But such a backup is still better than not running a backup at all because the backup operation would be too intrusive (stop working, shut down applications, backup, start them again…).
So how bad is it? For example, what happens when a file gets modified after restic has scanned it and before restic had a chance to upload its content? Will the backup operation complete without detecting the problem and if so, will the repository still be okay? My concern is that restic might think it is writing “foo” as content into the repository while in fact it is reading “bar” from the modified file and then uploads that instead.
LVM snapshots may help with that. At least then restic gets to work with a read-only, static file system. The content of the snapshot itself might still be inconsistent, but that’s not that different from a sudden power loss, so hopefully the filesystem and applications will be able to recover from that.
This leads me to another, related topic: several applications now use databases, for example sqlite in Firefox and Chrome. Assuming that an LVM snapshot takes care of consistency, is it still better to dump such databases before backing them up? Any experiences which approach is more efficient? Directly backing up the sqlite files simplifies the restore, but perhaps it also implies storing more data?
I have another database where a full backup is definitely overkill, a notmuch full text search database which mostly contains data that can be restored, but not completely. For that I definitely want to find a solution where I first dump the relevant content (there’s a command for that) and then only backup that output. This can be done right now with some scripting around the restic invocation and exclude rules for the full database.
I’m just wondering whether it would be better to include support for backing up database dumps directly into restic. What I have in mind would work roughly like this:
- for certain file types, configure helper commands that know how to dump and restore the files
- when restic encounters such files during a backup, it invokes the helper, caches the output in a temp file and backs up that file instead
- during a restore, it does the inverse
Besides avoiding the need for scripting, this also has some other advantages:
- the database dump is part of the normal snapshot (compared to a solution which pipes the output as stdin into a separate restic invocation for each database)
- no need to write the database dumps into the normal filesystem