Can repositories be merged?

HeavyThumper · August 21, 2018, 3:18pm

As the title says - if, having completed backups to multiple independent repositories (which happen to be on the same remote storage) is it possible to merge them? In my case, they were all created with the same password if that makes a difference.

fd0 · August 22, 2018, 6:41am

No, that’s not possible right now. Although you used the same password, the underlying encryption/signing keys will be completely different. Just copying the files into the same directory structure will create a broken repository which may or may not work with restic.

Dj0k3 · September 8, 2018, 10:22pm

I’m curious about this. I know you @fd0 said that this is not possible right now… but by “right now” you mean this could be a possibility in the future? It would be really nice to have this feature. In my case, I made different repositories for different machines because I noticed that when you do backup as root (for backing up your entire system) it conflicts with another users/keys created that are not root. I have one small server and three other machines. I created a testing repo and tried to backup all machines but since the server requires root access to backup, once I do a server backup the other machines can’t access to the repo and the output displays a “Permission denied” error.

Another reason I kept my machines in different repos is because I don’t really know how Restic behaves if I setup a cron job to run backups for all machines and one of the machines start doing backups while other is not finished yet. I have a few questions about this:

How Restic handle this situation? I did a test backing up two devices at the same time and it went good for what I saw, but is this something that you can do in a daily basis? Maybe to run a backups in every machine every 4 hours, for example, could Restic handled this? Maybe this is a noob question but I searched a little in the forum and I haven’t found anything. Maybe my search wasn’t specific enough, I don’t know.
How Restic handle forget rules with multiple hosts in the same repo? I read in the restic manual that “When forget is run with a policy, restic loads the list of all snapshots, then groups these by host name and list of directories”. So based in this last sentece, when you run forget policies in a repo with multiple machines it’ll apply those policies by host name automatically? So if I setup to keep 1 per year and I just did 1 snapshot for 1 machine within a year, that snapshot will be preserved but it will forget the other hosts according to their policies? Can I setup different policies for different hosts or once you run a forget it’ll apply that for all hosts?
Is there a way to backups multiple machines even when one of them uses a root account (for full system backup) while the others don’t?

matt · September 8, 2018, 10:42pm

I’m not sure I follow here. Root where, on which machines? Can you be more specific? Where is the repo?

Dj0k3 · September 9, 2018, 12:07am

Ok, so this is my scenario right now (all linux machines and all it’s done in my local network):

Server holds HDD which contains a repo for every machine.
Restic take snapshots for Machine 1, 2 and 3 as regular users (I’m backing up just Homes directories in every machine) and sends it to the HDD connected to the server via sftp (right now every machine have its own repo).
The same server take system snapshots (just for the server) to it’s own repo located in the same HDD.

What I was trying to do was to create just one repo for all machines. I created the repo and try to backup all machines to the same repo. Everything went good until I took a snapshot for the server. I assumed that, with the server using the root account to take a snapshot for the system, then maybe at some level the root account took ownership of the repo so when I tried to do a new snapshot for the other machines again it fails and sends me the “Permission denied” error.

I tried it a second time but this time creating a key for every machine. The same thing happen. I tried this locally in one machine too. Created a repo and did a backup. Once I changed to root user and created another backup the repo became inaccessible for the regular user. That’s why I’m assuming the problem is about permissions. Maybe there’s a workaround or something that I’m not aware of. I’m using restic 0.9.2 in all machines.

matt · September 9, 2018, 1:36am

It makes sense that if the root user is writing to a critical file in the repo, such as locks or a config file, then later the ssh user you’re logging in as won’t be able to access it.

At what point in the backup does the permission denied error get raised? What are the logs/output?

(In any case, I don’t think this has anything to do with not being able to merge repositories. It’s totally fine for multiple computers to back up to the same repo, and yes, it’s technically feasible. Your permissions issue is definitely totally separate.)

Dj0k3 · September 9, 2018, 4:29am

Yeah, I’m aware. I can open a new topic for this if it’s better. I’ll be doing a test again tomorrow an I’ll post the output. Also, I’ll create a new user with root access to see if it makes any difference.

fd0 · September 9, 2018, 8:29am

Yes, indeed, for example when we eventually implement a copy command which copies data from repo A to some other repo B. That’d mean not only data, but also snapshots, this means “merging” the data from A into B, in the sense that all data contained in A can also be restored from B. There’s an issue to track this: Add command to copy all data to another repository · Issue #323 · restic/restic · GitHub

As long as it’s only restic backup it’ll just work. This command will only add data, not remove it, so it’s safe to do that concurrently. Other operations like prune and forget will lock the repository exclusively for the time the command is running, so no other instances of restic can access the repo during that time.

It just works with the default settings. Running e.g. forget --keep-yearly 1 will keep the last snapshot in each year for each machine. You can configure this with the --group-by option. In general, the default settings for forget do the the right thing (at least we tried to implement it that way).

I hope the explanations help a bit. If you have ideas on how to improve the manual, please let us know!

Dj0k3 · September 9, 2018, 9:59pm

It helped a lot. Thank you for taking your time and answer my questions. The manual help a lot, maybe a couple of more examples overall could not hurt but on the other hand it’s a manual, it’s not a how-to blog.

@matt I created a new topic for this.

HeavyThumper · October 30, 2018, 6:27am

You potentially left an opening so I’m going for it…

Given multiple separate repos, which happen to use the same password, what would be the result of copying only the /data folder from each of the “child” repos into the “master” - and then having the backup jobs for each child now reference the master repo?

Or, TL;DR version, given multiple multi-TB backups stored on OpenDrive, having created these repos separately for various folder trees or hosts - will there be a benefit (in initial backup time) to merging the /data folders into one repo? Then after a (hopefully) successful backup to the new consolidated repo location a new valid snapshot will be created (for each sub-repo) and if appropriate a “prune” can be performed. Or would this simply corrupt the repo?

I think that was understandable…

fd0 · October 30, 2018, 7:24am

That won’t work: The underlying master encryption keys are different, despite the repos having the same password. Did you discover the design documentation yet?

No. If they are stored in different repos, just copying the files will create repo with a lot of broken files, which prune will remove and backup will re-upload the data anyway.