Backing up a a live (Linux) system

I’m aware that backing up a system where files are currently in use is risky because there’s no real guarantee for consistency. But such a backup is still better than not running a backup at all because the backup operation would be too intrusive (stop working, shut down applications, backup, start them again…).

So how bad is it? For example, what happens when a file gets modified after restic has scanned it and before restic had a chance to upload its content? Will the backup operation complete without detecting the problem and if so, will the repository still be okay? My concern is that restic might think it is writing “foo” as content into the repository while in fact it is reading “bar” from the modified file and then uploads that instead.

LVM snapshots may help with that. At least then restic gets to work with a read-only, static file system. The content of the snapshot itself might still be inconsistent, but that’s not that different from a sudden power loss, so hopefully the filesystem and applications will be able to recover from that.

This leads me to another, related topic: several applications now use databases, for example sqlite in Firefox and Chrome. Assuming that an LVM snapshot takes care of consistency, is it still better to dump such databases before backing them up? Any experiences which approach is more efficient? Directly backing up the sqlite files simplifies the restore, but perhaps it also implies storing more data?

I have another database where a full backup is definitely overkill, a notmuch full text search database which mostly contains data that can be restored, but not completely. For that I definitely want to find a solution where I first dump the relevant content (there’s a command for that) and then only backup that output. This can be done right now with some scripting around the restic invocation and exclude rules for the full database.

I’m just wondering whether it would be better to include support for backing up database dumps directly into restic. What I have in mind would work roughly like this:

  • for certain file types, configure helper commands that know how to dump and restore the files
  • when restic encounters such files during a backup, it invokes the helper, caches the output in a temp file and backs up that file instead
  • during a restore, it does the inverse

Besides avoiding the need for scripting, this also has some other advantages:

  • the database dump is part of the normal snapshot (compared to a solution which pipes the output as stdin into a separate restic invocation for each database)
  • no need to write the database dumps into the normal filesystem

I’m guessing the need for a snapshot before backup depends on the applications on the system but I’m not an expert in such things. So far I have not had any problems in that regard. But you might be interested in this:

https://restic.readthedocs.io/en/stable/040_backup.html#reading-data-from-stdin

This “reading from stdin” has the drawback that restic needs to be invoked once per database and then the database dump isn’t part of the normal snapshot.

That is true. On the other hand, however: how would you reference the db dump if it can’t be a file in your normal file system and also not a separate snapshot?

Instead of a “foobar.sqlite” file I would store a “foobar.sqlite-restic-dump” file (naming to be determined…) in the snapshot without actually having to write that file into the filesystem that is getting backed up. Then read-only mounts can also be backed up, or partitions that are simply 100% full.

In the meantime I had an idea how I can achieve the same without changes to restic:

  • mount a tmpfs, say as /restic-database-dumps
  • dump databases into /restic-database-dumps/<original database name>-restic
  • create a snapshot that includes / (with databases excluded) and /restic-database-dumps
  • on restore, look for dumps in /restic-database-dumps and restore the original database

The icing on the cake would be include the restore commands in the snapshot; that way a restore works without the original backup configuration.

Hmm, sounds like this is better done outside of restic after all…

I’ve set my local backup following this: https://github.com/erikw/restic-systemd-automatic-backup

I might modify that to cover databases and then will try to get those changes merged and/or publish my fork.

Hm yeah not sure but I like the Unix philosophy very much: one tool does one job very well. It sounds like this is one of those situations.

Btw I have a backup server that runs a nighly bash script with all kinds of jobs via ssh on the client machines: dumps, rsyncs, cache stuff, etc. That script then also runs restic on the clients via ssh one by one (ssh from backupserver to client and there restic sftp:backupserver:/…). That way the repo is never still locked and everything is done in an orderly fashion. At the end the whole repo is rsynced to an external location so I could do a disaster recovery.

From time to time (if there is time to spare) I manually run a forget and prune script that does take forever. Works pretty well so far.

We are using the LVM snapshot approach in production. Each VG has 5-10% free space for snapshot volumes.

The backup script takes a snapshot of each volume, mounts them, chroots into the mount, does some work in there to run the backup, unmounts everything, and deletes the snapshots.

Note that LVM snapshots are atomic with respect to each volume, but not between volumes. For example, if your database files are spread across multiple volumes, you must stop the database server or pause writes before taking the snapshots or the snapshots of each volume will not be consistent with respect to each other.

We solve this potential hazard for MySQL like so:

echo '
        FLUSH TABLES WITH READ LOCK;
        SYSTEM lvcreate --quiet -l  33%FREE -n ls-vol1 -s vg-primary/lv-vol1;
        SYSTEM lvcreate --quiet -l  50%FREE -n ls-vol2 -s vg-primary/lv-vol2;
        SYSTEM lvcreate --quiet -l 100%FREE -n ls-vol3 -s vg-primary/lv-vol3;
        UNLOCK TABLES;
' | mysql || exit 1

This tells MySQL to flush all buffered writes and read-lock all tables in every database (allowing reads but blocking writes), creates the snapshots, then unlocks the tables. This guarantees that the MySQL data files on each LV are totally consistent.

Once we have the snapshots created and mounted, we run a second mysqld chrooted into the snapshot mountpoint. This allows us to take a SQL dump without exclusively locking all of our MyISAM tables. We could also just back up the data files, but a SQL dump has several advantages.

3 Likes

Wow that is interesting. It’s awesome to see the challenges that can occur when you have really big data sets and lots of traffic. And isn’t it cool how you can solve so much by using “standard” unix tools? Lovely :slight_smile:

My interest is more around a normal desktop where / and /home might be different partitions, but consistency between them isn’t that important - one could have changes to /etc/passwd that aren’t mirrored in /home, but that seems unlikely. So just plain LVM snapshots should be fine, with dumping of all the various (sqlite) database as an optional step after that. Trying to lock all of those before taking the snapshot sounds like overkill to me.

Okay. I backup a Mac and two Linux machines (elementary and Solus) using restic. I don’t have huge sqlite dbs in constant write mode but at least those used by my applications (like Firefox) have not been a problem so far even with the apps open and active during backup. And I have successfully migrated several systems to new hardware using restic restores already.

It’s not even necessary if the sqlite databases are opened with a journal (which they should be).