Make restic more attractive for obnam users

fd0 · August 14, 2017, 9:07am

Yesterday, the author of the obnam backup program announced its retirement. Over time we had a few people asking for specific features that restic does not yet support (mainly support for CACHEDIR.TAG, #317, #1159). I’d like to make restic more attractive to users switching from obnam.

Apart from the cache dir tag, is there anything else that makes obnam users not want to switch to restic?

erik-brangs · August 16, 2017, 7:57am

I’m currently using Obnam and I’m currently considering switching to restic.

I’d prefer a configuration file for backup settings over environment variables. What’s your view on this?

raegis · August 16, 2017, 8:41am

Hi! My wishlist item: I’d like to see the password requirement made optional, with added support for PGP keys. Obnam has the “–encrypt-with” option which allows you to use a PGP key for encryption. There are no passwords in Obnam-- PGP keys are required for encryption.

Very nice program you have here! I use Debian stable, but am able to install the lastest version of restic from Sid because the only dependency is libc!

Elfmeterversemmler · August 16, 2017, 12:25pm

I am also using obnam at the moment and like Erik I am considering switching to restic.
I also would prefer a configuration file over environment variables or command line parameters.
I just heard your lecture from early 2016 and from this a few questions evolved.

For a better understanding I’d like to describe my current setup with obnam.

I have a home network with some servers running Linux and a server in the data center of a provider.
To back up the internal servers I do it directly with the sftp option in obnam and PKI (most of the servers are Raspberry ones) to a single directory on the backup server using the ‘–client-name’ option. So, this would be nice to have a similar option in restic as well unless there are serious concerns of doing it that way.
To back up the external server I also use the sftp option with obnam but backup to the same directory but since the backup server is located on my internal network the directory is mounted to the gateway and the external server is backing up into the mounted directory on the gateway.

So, basically my requirements at the moment are:

‘–client-name’ option to store data of all backups in a single repository
config file for several options
log file, log level and maybe log size/rotation (I have the opportunity to set the logfile size and numbers of iterations in obnam and obnam is taking care of the rotation of the logfiles)
setting excludes (like cache files/directories)
override the above with includes
retention policy similar to the keep option in obnam
something similar to ‘obnam forget’ (probably already there with the optimizer and stuff?)
unlock in case of crashes similar to ‘obnam force-lock’ (I saw this already in your lecture)
ability to run on ARM architecture (Raspberry) (have to check if I can compile it there or cross compile it for ARM)
probably some option to disable/enable output on the command line (quiet/verbose mode)

Maybe there is more but at the moment I am happy to discuss the above points and hear pros and contras.

fd0 · August 16, 2017, 7:57pm

I’m trying to answer all the questions in a single post, here we go:

We’ll add a configuration file (I want that too) but it hasn’t been done yet.

The answer for that is: “Maybe later”, see here: Unencrypted backups · Issue #1018 · restic/restic · GitHub

That’s very likely not going to happen: Interacting with GPG as a program is hard, it’d add a dependency that is hard to manage and requires users to use the suboptimal (to say the least) user interface of GPG. There’s also no portable way to do that. So, no, we’ll not use GPG (with high probability).

For a backup program I think it’s important to have almost no dependencies: Each dependency can fail and prevent a successful restore when you need it most. So we’re trying to reduce dependencies as much as possible.

As long as the servers have different hostnames (or you set a unique hostname with restic backup --hostname foo) you can just backup to the same repo, it’ll work just fine because restic records the hostname and does the right thing. Usually you shouldn’t need a separate option. You can then list the snapshots and it will show the hostnames, when you mount the repo, restic will serve the snapshots in a per-hostname directory.

Should also just work out of the box without any special parameters or so.

For the other things you listed:

You don’t need an option to backup multiple machines into the same repo
A config file has not been implemented yet (but it’s planned, just not done yet)
There’s no way to write a log file, restic is (at the moment) mainly targeted at interactive/scripted use, without a log file
Excluding files from a backup can be done with restic backup --exclude ~/.cache, the patterns can also be read from a file with restic backup --exclude-file ~/.config/restic-excludes.
Retention policy is similar to obnam, see here: https://restic.readthedocs.io/en/latest/manual.html#removing-old-snapshots and Redirecting…
restic unlock will remove stale lock files, but you shouldn’t need it. Backups can be made concurrently (it’s just not so efficient since restic will then maybe write duplicated data, but that will be cleanup when running restic prune the next time)
restic is a Go program, which can easily be cross-compiled for all supported architectures, ARM is among them. We even have pre-compiled binaries for linux/arm. Or just run go run build.go --goos linux --goarch arm from a Linux, Windows or OS X machine
restic backup --quiet will not print status reports etc.

I’d suggest you give it a try

raegis · August 17, 2017, 2:54am

Thanks for your reply! I think my language concerning passwords was not clear. I agree that all backups should be encrypted, but I prefer not to use passwords for anything. I was requesting either PGP support or password support, not neither.

Anyway, I’m writing to clarify, not to change your mind–I’m fine with your decision. I am very pleased with minimal dependencies, and currently I use a plain text RSA private key (openssl genrsa) as the password file.

mbiebl · August 18, 2017, 6:35pm

What would be super-awesome, is a tool to convert an existing obnam repo into a restic repo.
Atm, I use a cobbled-together script which mounts the obnam snapshots via fuse, fakes the original date (via date) and then runs restic backup on those obnam fuse mounts. That is super slow though.
My obnam backups go back to 2013 and at the current rate it looks like I’ll have to run that script for several days.
One snapshot is about 100G and takes about 3 hours to process. I blame obnam for its slowness here.
I know this is a special case and I’m not sure if other users care about their existing backups (and plan to convert them). But given that Lars plans to remove obnam from the Debian archive, I don’t want to rely on obnam in the future to access my old backups and I do want to keep them accessible.

Elfmeterversemmler · August 21, 2017, 9:47am

Thank you for your comprehensive answer. Looks like restic has a lot in common with obnam and I only have to check which parameters/options should I exchange.
I will definitely give it a try

epadepat · August 21, 2017, 5:07pm

As an Obnam user looking for alternatives I find pull backups to be the missing piece.

I back up a few local devices and a couple of web servers. Content is duplicated across devices and servers and I’d like to keep it all in one repo. Having disks locally means it’s easy to rotate them off.

Some way of pulling backups from the external servers into backup on the local network would be great. Ideally whilst being frugal with space on the external server. Barring the ideal space saving option another one would be to somehow pre digest backups on the server and only pull and merge the changes into the local repo. This is doable as the binary is so self contained and there’s really no hassle moving it around.

rawtaz · August 21, 2017, 5:32pm

I’m not sure I see why you’re trying to do it the way you do. Why don’t you just have two backups, on to the remote and one to the local? Restic is so fast running two backups is hardly a problem.

If that’s not an option for you, you can simply rsync the repository from one place to another.

If still not suitable, there’s a user working on being able to copy whole or parts of a repository from one place to the other, perhaps when that’s done it is something you could use.

epadepat · August 21, 2017, 6:29pm

Two backup repos would defeat the duplication. Many of the devices share some subset of the same data but it’s located in different places. Rsync or any other means where the data doesn’t end up in the restic repo suffers the same issue.

If when copying the repo, or parts of it, around you can select diff and merge repos that would be similar to my last “suggestion” above. It would be cool if you could then run skinny backups which generate and store the diffed data in a local backup which is then pulled to the backup server repo when ready.

rawtaz · August 21, 2017, 6:47pm

I’m still not really getting your point, but I guess it’s just me.

I thought you said that your various systems that you want to back up have redundant data. Then if you back all of those systems up into one and the same repository, that redundant data will to a large extent be deduplicated, right?

Then whether you do two separate backups (one offsite and one onsite) or just one which you then rsync/copy from onsite to offsite (or vice-versa) is another matter and up to you really.

epadepat · August 21, 2017, 7:22pm

%-(
Haha we really seem to talk past each other here! Are you suggesting keeping a full rsync clone of every non local server on the backup server and then backing up these local copies? Not very elegant imho.

The machines have practically no redundant data in themselves. A pool of machines would benefit hugely from deduplication. That and manageability is why I want a single repo. Local data big remote data smaller.

rawtaz · August 21, 2017, 7:37pm

Hehe, sorry if I’m not being clear. Yeah, perhaps we’re just not talking about the same thing. I’ll stop suggesting stuff because I fail to make out what you’re trying to say. To answer your question; No, I don’t think that’s what I suggested.

fawick · August 22, 2017, 10:48am

@epadepat, let me see whether I grasped your use case:

You have a couple of hosts H1, H2, H3, … which you want to take backups from. Some of these are in your local network, some are remote but available via Internet.
You want to backup all of these into the same repo, profiting from the deduplication of data.
You want to trigger these backups from the machine S1 into your local network, on which the backup storage lies (what restic calls the repository).

Restic supports 1 and 2 directly. For 2, it doesn’t matter whether shared data lies in different directories on the hosts, it will deduplicate the contents of H1:/foo and H2:/another/place/for/foo all the same, thanks to the content defined chunking (cf. https://restic.readthedocs.io/en/latest/design.html, “Backups and Deduplication”)

Item 3 is not supported directly at the moment, but it’s easy to workaround that. I assume you can reach H1 etc. via SSH. In that case, run github.com/restic/rest-server on S1, and create a SSH-tunnel for the HTTP-access (let’s assume ssh h2 -R 8000:localhost:8000) and use restic -r rest:http://127.0.0.1:8000 backup /folder/to/foo.

In case you are not worried about HTTP security or a firewall is preventing you from doing so, you could also use an HTTP URL that points to S1 directly. Also, of course, SSH is just one of many ways to tunnel HTTP.

Last, but not least, in case H1 et al. can reach S1 via SFTP, you could also use that. Although SFTP is a very slow protocol.

If you really need to have dedicated local repositories on H1, H2 …, you need to make sure all of them use the same key, otherwise the encryption will differ between all hosts and deduplication will never be possible between the hosts.
You could then sync the individual data/ subfolders of the local repostories to S1 (e.g. with rsync), although I’m not sure whether this will duplicate all of the data packs perfectly. You might want to sync the snapshots/ as well, and definitely call restic rebuild-index and restic check on S1.

leak · August 22, 2017, 4:16pm

Also using obnam and looking for a replacement.

The no. 1 missing feature for me is definitely config files.

I like the behavior of obnam which has default search path for config files, so you don’t have to pass the configuration path every time you just want to use a command. Basically you can just run obnam generations and it will pick up the config file from e.g. /etc/obnam.conf and obtain the required parameter to execute the command.

epadepat · August 22, 2017, 10:06pm

Thanks fawick. Seems to work well. Testing using command=/home/user/restbkp.sh KEY in .ssh/authorized_keys on H1 (this runs restic -r rest:http://127.0.0.1:8000 backup dir1 dir 2 dir3) and then a script on S1 that starts the rest sever and opens the tunnel using a backup only ssh-key. Coupled with the following stanza in .ssh/config on S1 which pass the respository password from S1 to H1.

Host bkp-H1
        Hostname H1
        Port 22
        IdentityFile ~/.ssh/bkp-id_ecdsa
        SendEnv RESTIC_PASSWORD

All this to be able to run in a cron job of course. Just posting my results for anyone interested.

fd0 · September 9, 2017, 8:58am

Good news everyone: I’ve just merged Pull Request #1170 which adds --exclude-caches (to exclude caches just like obnam did) and an option --exclude-if-present foo (which excludes a directory if a file named foo exists).

rawtaz · September 9, 2017, 12:27pm

Did you add --exclude-if-present ‎filename[:signature] or just the one you wrote?

dionorgua · September 9, 2017, 1:16pm

As far as I see it’s --exclude-if-present ‎filename[:signature]. Also --exclude-caches is just alias for CACHEDIR.TAG (with proper header).

This option can’t be used multiple times (including --exclude-caches --exclude-if-present something.else).

CACHEDIR.TAG itself will be included in backup!

In any case it already covers 99% of my needs! So I will be able to drop my own --exclude generator.