Backing up a live server application

Chewing_Bever · January 9, 2024, 4:33pm

I’m looking into Restic to serve as the central backup system for my Raspberry Pi’s, with one Pi hosting a Restic REST server and all the others publishing their backups to this server.

However I can’t seem to find any information on how Restic behaves when backing up the data of a live application (specifically Conduit, the Matrix server). Does Restic have any mechanism in place to ensure a backup is atomic? e.g. if Restic reads file A and then file B, but file A got changed by the application while Restic was reading in file B, this could result in a corrupted backup if not handled. Because Conduit’s data mostly consists of a RocksDB database, I’d like to make sure first this won’t be a problem.

Any insight would be appreciated, thanks in advance!

kapitainsky · January 9, 2024, 4:36pm

Short answer is - No

It is your responsibility to make sure that data does not change during backup process.

More info in this thread:

Chewing_Bever · January 9, 2024, 4:43pm

Thank you for the quick reply! I assumed this was the case, but I figured I’d ask anyways. I’ll look into working with BTRFS snapshots then, and combine these with my original idea.

kapitainsky · January 9, 2024, 4:46pm

Yes BTRFS snapshots definitely will help:) I ma running RPi with BTRFS on ext disk for some time now and it is rock solid.

nicnab · January 9, 2024, 7:18pm

Usually it’s not the files that are the hard part but the database. Not sure if there’s a proven way to live backup the files of a running dbms like MySQL but I think stop-dump-start is the way to go if you want to be sure.

kapitainsky · January 9, 2024, 7:19pm

Agree 100%. And thx to BTRFS snapshots stop can be real brief.

Chewing_Bever · January 9, 2024, 7:29pm

wouldn’t snapshots completely negate having to stop the service at all?

nicnab · January 9, 2024, 7:33pm

That might be the case but I don’t have any experience with this and I’m guessing it depends on how your application is written. Imagine a set of transactions running on your DB and the snapshot takes place in the middle. Will the application be able to cope with that status after a restore?

Chewing_Bever · January 9, 2024, 7:45pm

Hmmm well a well-built database would ensure on-disk consistency at all times, but I could definitely see this particular situation being a problem for some services.

kapitainsky · January 9, 2024, 7:48pm

In case of database only if it is filesystem snapshots backup aware. In high level it means that you can issue database command to flush all data to disk and “freeze” its state for time it takes to take filesystem snapshot.

stop DB-> snapshot -> start DB -> use snapshot to run backup is much easier way and unless you run some mission critical system I would rather look at this option.

Do google about your specific software.

nicnab · January 9, 2024, 7:55pm

That’s why I don’t write banking software but if I did, I’d write two queries for a money wire transfer:

Alice pays 5 bucks.
Bob receives 5 bucks.

Now you do your backup and restore after 1.

alexweiss · January 10, 2024, 10:51am

@nicnab In banking software and an sql database you would put both queries in a transaction - which acts atomically on the database.

nicnab · January 10, 2024, 11:29am

That is actually why I said that it depends on the application. Yet I’m still not sure: would, for example, btrfs let mysql finish a transaction before freezing everything for a snapshot? And how will your application make sure all files are in order with the db transactions right at the moment the snapshot is taken?

I guess if all that is answered, a db dump might be unnecessary.

kapitainsky · January 10, 2024, 12:47pm

Nope. These are totally two separate things. Filesystem snapshots only protect your backup from changes happening during backup. You still have to make sure that at the moment you create filesystem snapshot your apps state on disk is stable.

For database or virtual machine the easiest way is:

stop apps -> snapshot -> start apps -> use snapshot to run backup

As taking snapshot is pretty much instantaneous your app downtime can be measured in seconds.

If you want to do this without stopping app you need apps that support it etc. There is no one universal way to do this in such case.

For example for MySQL hot backup is only possible with Enterprise Edition and requires extra steps.

I would say that if you are running system you can not afford few seconds app stop to initiate backup then probably you can afford some IT consultant to design backup process for your specific case:)

shd2h · January 10, 2024, 1:57pm

FWIW the “stop, snapshot, start” approach is what I use for backing up a synapse (another matrix server) install. I have the job set to run in the small hours of the morning, when nobody will notice the very small downtime. I note the actual restic backup job happens sometime later, which keeps the length of the downtime window short.

That said, even if I did run the job during working hours, all the end user would see is a brief “lost connection to server” message. Once the server is back up, the the matrix client should send/receive any pending messages (based off my experiences with element anyway, I haven’t really tested other matrix clients).

Chewing_Bever · January 10, 2024, 3:10pm

Is there a reason you don’t back up your synapse database by dumping its contents to an SQL file? This is the approach i’m planning to use, as I have heard before that simply copying the files from the database isn’t an optimal solution.

shd2h · January 10, 2024, 3:59pm

Simplicity, more than anything else.

The db runs in a container as part of a docker swarm. It’s non-trivial to attach to the container and run dump commands (can’t guarantee what swarm node will be running the container, or what name it will have, etc).

I’m sure there are other (better?) approaches, but this has run smoothly for the last few years under light use, so I don’t have any reason to make things more complicated for myself than they already are

For completeness, this is not the only container with application data that I need to backup, so having a simpler approach that works uniformly for multiple different applications is preferable to me.

Your situation is likely completely different, and dumping the DB (and backing that dump file up) is the approach I’ve seen recommended.

nicnab · January 10, 2024, 4:03pm

I don’t want to drag this out any further but just a short question regarding your comment: does only backing up run smoothly or also restoring? The interesting part about having a backup is a working restore

kapitainsky · January 10, 2024, 4:22pm

Very valid point many people do not think about:) And only discover what is missing when shit hits the fan:)

IMO does not matter how careful and methodical planning only ultimate validation is testing.

For simple directory with files backup it is enough to test if stuff can be restored and is readable.

For more complex situation when you backup live applications/database/VMs - to be sure you have to try full system recovery from backup. I have never seen situation when during recovery testing everything was 100% right first time. There are always eurekas about what else has to go to backup, in what order restore has to be executed, missing dependencies etc.

nicnab · January 10, 2024, 5:10pm

@kapitainsky Absolutely. As a matter of fact, when I install a system, I write the documentation first and then install by documentation. The restore then gets added to the documentation and, as you say, has to be tested once in a while. In most cases this is super easy as we’re usually talking about VMs/CTs anyway.