Hey all, I’m new to restic, have been trying to do my research upfront, and would like to verify an assumption before I proceed too far along a particular path.
My impression is that it is safe for multiple instances of restic to operate on a single repo (as can happen with scheduled jobs for snapshotting, pruning, and read-data checks).
In my case, I’d like to have a computer’s repo reside on a local server that is running restic-server. I’d also like the server to be running Backrest to ease admin of the same repo. Restic-server would handle receiving and restoring snapshots, while Backrest would handle the scheduled prune and integrity checks. Potentially another job would handle replication to external (and/or off-site) storage.
Is this safe?
I’m focusing on a single client case here, but in reality there are multiple users in my house across multiple devices. I’m planning on giving each user their own repo, as I’m expecting a fair amount of duplication within a user’s data, but not across users. Also, one “user” will be on the server itself, which will be backing up my hosted application data and config files.
A key goal of mine is to minimize network traffic and client compute burden, while also ensuring that data safety checks/off-stie replication happen at a regular interval - regardless of whether a client has submitted a snapshot recently.
It is absolutely safe, but operations that need an exclusive lock on the repository (mainly operations that remove or alter things) can only run one at a time. For example, you can have multiple backups running simultaneously to a single repository, but only a single prune operation. Other operations that try to start while the prune operation is running will fail with errors stating they could not lock the repository.
For your proposed use case, so long as you can schedule the jobs so they don’t overlap with one another (or the scheduling is robust enough to “try again later” in the event of the repository being locked) you should not have any issues. Regardless, even if the jobs do overlap, it is “safe” in that they will fail safely, and not introduce data corruption.
Thanks, this was the reassurance that I was hoping for!
Operations failing because they require a lock sounds totally fine to me, and something I’m sure I can schedule around, or at worst be fine with happening occasionally. Actually, more robust observation and notification for my hosted applications is next things to do on my list after getting backups sorted. I’m learning all of this from scratch, and trying to build a good foundation before going off and playing with more “for fun” applications.
I really appreciate the link to the post discussing the types of locks per operation. I’m going to bookmark that, as it’ll be very helpful while I work on scheduling.
One remark about the locks: Operations which depend on locks to be safe can naturally only be as safe as the locks provide guarantees about consistency. Note that the locking used by restic is about existence (and content) of a lock file in the repository. So these guarantees strongly depend on the consistency guarantees of the storage backend you are using for the repo - and some remote storages only provide eventual-consistency.
To give an example:
you run a prune on a remote repo (e.g. on a cloud storage) from one PC. This sends a request to create a lock file to the backend
shortly after you run a backup on the same repo from another PC, maybe far away from the first one. This sends a request to look up lock files. Due to eventually consistency, this request can be answered as “there exists no lock files” → backup doesn’t abort
prune and backup run parallel and there is a good chance that the snapshot generated by the backup run references data which is just removed by the prune run.
TL;DR: Always check the consistency guarantees of the storage before relying on lock files for synchronization!
Okay, that’s definitely not something I was considering, and I really appreciate you pointing out. I’m picturing it basically like a race condition.
If I’m understanding correctly, think in my case I should be pretty okay because all of the locking is happening on only one machine (the server), where the storage is a local drive:
The prune and check read-data maintenance commands are both jobs that are scheduled and run directly on the server.
The backup and forget commands are issued by clients to restic-server, which is again running on the same server (with the repos being on local storage), so it’s the same as local command vs local command
Remote storage may come into play, but that would be with something like the copy command out to off-site or external USB storage. Off-site storage would be append only.
So I think I’m good for avoiding this particular problem? But I’d love a confirmation or a clarification if not. Again, much appreciated!
I think you’ve got the right conclusion, but for the wrong reasons.
You’re okay because both the “local” and the “REST server” storage back-ends offer strong consistency, not eventual consistency. So long as the back-end you’re using to access the repository offers that, it doesn’t matter which machine is trying to access the repository, or where the repository is physically located.