Managing Postgres backups alongside other files

skyesoss · November 4, 2024, 3:06pm

I’m using restic to create backups for a personal server, and some of the services keep data in the file system as well as a database such as Postgres or SQLite.
From looking at others’ recommendations, one can backup a database via the --stdin-from-command option:

restic backup \
    --stdin-filename postgres.sql \
    --stdin-from-command -- pg_dumpall --clean --if-exists

And additionally backup file paths:

restic backup /var/lib/immich/{library,upload,profile}

However this process creates two separate snapshots, when I want these to be bundled together as the contents of the database reference the file paths that exist at the time of the backup.
Is there a way to use both of these backup methods for the same snapshot?

Additionally, is there a way to integrate restic with Postgres’s PITR (pg_basebackup)?

rawtaz · November 4, 2024, 10:43pm

No, that’s not possible.

Integrate in what sense? There are multiple interpretations of that question.

You could use the settings mentioned in PostgreSQL: Documentation: 17: 25.3. Continuous Archiving and Point-in-Time Recovery (PITR) to have Postgres run restic when the WAL has changed, but if your server generates WAL segments at a high rate you’d probably want to throttle this so that restic isn’t run too often (which you could e.g. do through a script that wraps restic instead of setting archive_command to run restic directly).

Otherwise you can just schedule restic to back up the WAL at regular intervals. If you are thinking about something else or I misunderstood your question, you probably need to be more specific. I never used pg_basebackupso perhaps I missed the mark entirely.

skyesoss · November 4, 2024, 11:18pm

Thanks for the information.

For the first part (assuming I’m backing up databases via pg_dump(all)), is the standard recommendation to just perform both restic backup commands separately? Or will this mess things up when it comes time to prune snapshots.
The other option I was looking into is a dedicated “Postgres Backup Tool” like pgBackRest which could complement my restic-based file backups.

As for the second question, I’m not looking to backup my Postgres database at the same speed as the WAL.
The primary benefit that the pg_basebackup option gives is speed of creating a backup: it takes much less time to run pg_basebackup than running pg_dumpall.

rawtaz · November 4, 2024, 11:24pm

That’s what you can do, unless you use your filesystem to create a snapshot which you then back up in one single restic run.

How do you mean? How would prune mess it up or be affected by how many snapshots you take of your data or what data is in the different snapshots?

I think you know more about PostgreSQL backup stuff than I do, I haven’t digged into it that much really, I just dump my databases and that’s all I need to back up.

cdhowie · November 5, 2024, 7:17am

We use btrfs on our servers so we can leverage btrfs snapshots in cases like this. Once you have a snapshot, you can back up the database data directory itself along with the files, or you can start a second database service instance against the snapshot and do a dump using the database tools.

You can do the same thing with any atomic filesystem- or block-level snapshot (such as LVM).

In more scalable systems, such as when you are using a replicated database technology along with object storage to store your data, there is no way to create an atomic snapshot of both systems. Instead, you have to design your application in such a way that it can tolerate out-of-sync data.