I’m still new to setting up backups but really liking using Restic and autorestic as a wrapper which makes my scripting much easier.
I currently have an unRAID server that backs up once a day to a Restic rest server backend on another unRAID server and also to backblaze b2 as a second location.
The question I have is, is it best practice to shut down your docker containers before doing your backup of the data folders? The reason I ask is in unRAID we have a plug-in that backs up the docker container folders but shuts down the containers before doing so.
Hi @Imitate8184 and welcome to the restic community!
Depending on what kind of data the volume holds, it could make sense to stop the container first.
For example if the volume holds a database, it would make sense to shut down the container prior, to ensure that there aren’t any unwritten changes.
But since I don’t have enough experience with Docker and containers in general, I can’t give you more help than this.
If you have the ability to do filesystem snapshots, this might also be something to look into.
Let’s see what others can add to this thread
The topic can be simplified a bit by more or less removing the Docker aspect of it, and concluding that the processes that you run in your containers are simply processes just like those that you don’t run using Docker. The difference is that Docker wraps them in namespaces and isolations of various sorts - but they’re still processes.
That aside, how to back their data up boils down to what processes they are, or more specifically what they do and how they store and process the data you want to back up. If you have a process (container) that just reads and writes some files and where there’s no problem with just backing those files up as they are, then you can just go ahead and do that. No point shutting down the container.
Another example is a database container in which you run a database such as MariaDB. It’s recommended to dump the database to a file on disk and back that up, instead of backing up the data store of the database process. But again, it’s not very different because it runs using Docker, so you can just start by looking into recommended practices for backing up e.g. MariaDB.
If it’s an SQLite database, it’s just one file and like the previous example you can just back that up straight away (but then you probably wouldn’t have it in a separate container).
So it all boils down to what you are backing up and how the process using/managing that data is doing is work. But the bottom line is that it has less to do with Docker and containers than you might think. Focus more on the processes rather than Docker.
So if I can distill your comment into a few key takeaways I feel this is what I gathered:
When deciding how to backup something focus on the processes that the application is doing
Applications and Docker are the same so not useful to think of them as different things
If you application has an external db then try to do a dump of your db prior to doing the backup and let restic back that up
a. Question: Here I would imagine doing a dump of your db means you don’t have to turn off your container?
b. Question: Should the db dump lock the db first to prevent writes, you think?
c. Photoprism, Nextcloud, Ghost, or FreshRSS seem like good examples here maybe?
If you application has a sqlite then you don’t have to turn off your container and just make sure you are backing that sqlite as part of the application folder (i.e. docker bind mount)
a. NGINXProxyManager seems like a good example here maybe?
If your application isn’t really doing a lot of reads and writes then maybe you don’t have to turn those off neither do you have to worry about a db.
a. Nodered, RSSBridge, Code-server, etc.
Just trying to create some actionable takeaways from your wisdom here. lmk your thoughts! Thanks again!
Do you know how filesystem snapshots here help? Maybe you have a doc or a webpage link you can send my way for me to learn about this. With unRAID i am unsure if they do snapshots, but they will be adding ZFS support soon-ish (?) and i may want to move to ZFS if it is better to move to.
No, Docker is a software that basically orchestrates security and isolation features in e.g. a Linux system such that you get the concept of containers running. Sure, Docker is an application, but applications does not by definition have the same features or traits as Docker just because both of them are applications.
It boils down to what database we’re talking about. Some databases can be backed up straight away, some you should dump or similar before backing them up.
Correct, because you can dump the database while the database server is running.
Depends on what database it is and what contents it has, but generally speaking this should be handled as part of the dump, yes. See the man page for the software you use to dump the database before backing up.
Examples of what? Neither of these are databases as far as I know.
Your SQLite database is nothing but a file in the filesystem (be it on the host filesystem and bind-mounted into a container, or in a volume, or whatever). Yes you should be able to back that file up straight away.
Indeed, if you deem that you have such low activity, e.g. at certain times, then you might get away with just backing it all up straight off the bat. But then again, there’s no reason not to do it the safer way, just in case.
Snapshots can be used to keep a filesystem consistent for e.g. backing up the snapshot. But this does not mean that the data in your database and/or files was consistent at the time that you took the snapshot, so it’s not really a replacement for looking at the applications you have running and architecting your backup procedures to take these into account.
You can send a ZFS filesystem to the cloud. But you do need something that talks ZFS on the remote end if you are intending to use the zfs send feature of ZFS.
Isn’t that great
ZFS does have deduplication, as you say, so please don’t say that it doesn’t.
Yes, but that’s literary the only “cloud” provider supporting ZFS (together with ZFS.rent which is a one man shop). The minimum is like 50$/month (or 60$).
ZFS deduplication is practically useless. Virtually no one in the ZFS community recommends it. The deduplication table in ZFS takes impractically huge RAM (and CPU).
There is no cloud storage receiving ZFS replication streams sent by ZFS send. There is rsync.net, which isn’t exactly a cloud storage, but, regardless, is too expensive and costs 50$-60$/month minimum.
18/year ? For non-ZFS perhaps. The ZFS account in rsync.net is a separate account with minimum 1-2TB, which comes out to be expensive, as of around 6 months ago. The CEO justifies this stiff fee due to the fact that, for ZFS, the company has to provide almost a VM (CPU, and RAM).
Ok, but the most important thing is that we solve OPs problem or answer the question they asked.
So, @Imitate8184 is there still something which you would need help with, or have the posts here been sufficient to get you going?