I’m looking into Restic to serve as the central backup system for my Raspberry Pi’s, with one Pi hosting a Restic REST server and all the others publishing their backups to this server.
However I can’t seem to find any information on how Restic behaves when backing up the data of a live application (specifically Conduit, the Matrix server). Does Restic have any mechanism in place to ensure a backup is atomic? e.g. if Restic reads file A and then file B, but file A got changed by the application while Restic was reading in file B, this could result in a corrupted backup if not handled. Because Conduit’s data mostly consists of a RocksDB database, I’d like to make sure first this won’t be a problem.
Any insight would be appreciated, thanks in advance!
Usually it’s not the files that are the hard part but the database. Not sure if there’s a proven way to live backup the files of a running dbms like MySQL but I think stop-dump-start is the way to go if you want to be sure.
That might be the case but I don’t have any experience with this and I’m guessing it depends on how your application is written. Imagine a set of transactions running on your DB and the snapshot takes place in the middle. Will the application be able to cope with that status after a restore?
In case of database only if it is filesystem snapshots backup aware. In high level it means that you can issue database command to flush all data to disk and “freeze” its state for time it takes to take filesystem snapshot.
stop DB-> snapshot -> start DB -> use snapshot to run backup is much easier way and unless you run some mission critical system I would rather look at this option.
That is actually why I said that it depends on the application. Yet I’m still not sure: would, for example, btrfs let mysql finish a transaction before freezing everything for a snapshot? And how will your application make sure all files are in order with the db transactions right at the moment the snapshot is taken?
I guess if all that is answered, a db dump might be unnecessary.
Nope. These are totally two separate things. Filesystem snapshots only protect your backup from changes happening during backup. You still have to make sure that at the moment you create filesystem snapshot your apps state on disk is stable.
For database or virtual machine the easiest way is:
stop apps -> snapshot -> start apps -> use snapshot to run backup
As taking snapshot is pretty much instantaneous your app downtime can be measured in seconds.
If you want to do this without stopping app you need apps that support it etc. There is no one universal way to do this in such case.
FWIW the “stop, snapshot, start” approach is what I use for backing up a synapse (another matrix server) install. I have the job set to run in the small hours of the morning, when nobody will notice the very small downtime. I note the actual restic backup job happens sometime later, which keeps the length of the downtime window short.
That said, even if I did run the job during working hours, all the end user would see is a brief “lost connection to server” message. Once the server is back up, the the matrix client should send/receive any pending messages (based off my experiences with element anyway, I haven’t really tested other matrix clients).
Is there a reason you don’t back up your synapse database by dumping its contents to an SQL file? This is the approach i’m planning to use, as I have heard before that simply copying the files from the database isn’t an optimal solution.
The db runs in a container as part of a docker swarm. It’s non-trivial to attach to the container and run dump commands (can’t guarantee what swarm node will be running the container, or what name it will have, etc).
I’m sure there are other (better?) approaches, but this has run smoothly for the last few years under light use, so I don’t have any reason to make things more complicated for myself than they already are
For completeness, this is not the only container with application data that I need to backup, so having a simpler approach that works uniformly for multiple different applications is preferable to me.
Your situation is likely completely different, and dumping the DB (and backing that dump file up) is the approach I’ve seen recommended.
I don’t want to drag this out any further but just a short question regarding your comment: does only backing up run smoothly or also restoring? The interesting part about having a backup is a working restore
Very valid point many people do not think about:) And only discover what is missing when shit hits the fan:)
IMO does not matter how careful and methodical planning only ultimate validation is testing.
For simple directory with files backup it is enough to test if stuff can be restored and is readable.
For more complex situation when you backup live applications/database/VMs - to be sure you have to try full system recovery from backup. I have never seen situation when during recovery testing everything was 100% right first time. There are always eurekas about what else has to go to backup, in what order restore has to be executed, missing dependencies etc.
@kapitainsky Absolutely. As a matter of fact, when I install a system, I write the documentation first and then install by documentation. The restore then gets added to the documentation and, as you say, has to be tested once in a while. In most cases this is super easy as we’re usually talking about VMs/CTs anyway.