Restore tests - best practices - lesson learned

As a result of a recent failure I’m overhauling my backup restore process. I thought others might find it interesting or useful, and that others might be able to help improve it.

The Problem
Until recently I thought I had quite a good backup and restore test system. My PC and servers are backed up using restic and sometimes another method to multiple destinations - local hard drive, external hard drive, offsite external hard drive, and AWS S3. I have scripts on my computer that I run that tests restoring a few files from those repos.

I had my Raspberry Pi 4 home automation server fail recently - the six month old m.2 SSD failed which became obvious when I restarted it. This server is backed up nightly to S3 using restic and I can see the snapshots appearing in S3 if I look. “No problems!” I thought. “I’ll get a new disk and restore from my backups”.

When I went to restore my files I ran into a problem… I didn’t have a copy of the restic repo password anywhere. I have passwords for many other repos, but not this one. Fortunately after a few tries the server booted and was up for about a minute before it crashed, and I was able to get the repo password. If I hadn’t managed this I’d have lost months of work - docker compose files, docker app configuration (home assistant, postgresql, pi hole, nginx, etc), app daemon apps, dchp allocations, etc.

Looking at my backups I also discovered that my web server backups had stopped working. I had edited the script that cron calls and made a syntax error. There was no error reporting.

Approach
For this new approach I will assume all my computers have been stolen or broken, I’ve lost my phone, and I need to restore my PC, my home servers, and my main web server. I won’t use existing scripts or anything else. I’ll do this annually.

Initial Process
To start with I’m going to go through all my computers and servers, make sure I have a restic repo locations and passwords documented in my password store.

I run my password store on my web server, so if I lose my phone, my PC, and my server I may lose access to my passwords. To mitigate this I will keep the repo list and passwords in a KeePass vault stored on an external hard drive I keep at a friends house.

Regular Check Process
I have a regular reminder every three months to check the repos to make sure backups are appearing as they should, in each destination.

Restore Process
Every year, on a new PC / VM, I will set up everything required to test my restores. I’ll install the AWS CLI, restic, and go through the list of repos to restore a couple of recent files from each repo.

More Copies
I’m also going to save a copy of the important files from my server to my PC occasionally, extracted from restic and zipped. That way if I manage to break things in a weird way I should at least have an older copy of the files.

I don’t trust all my data to any one tool. As well as restic providing a backup, I also upload my most important files to S3 in a versioned bucket once a week. That way I can also access them remotely if I need to.

Opinions
Does anyone have any suggestions to improve this process?

2 Likes

In my password database I also keep a copy of restic repos “config” file and “keys” folder, just on the off-chance either of these are corrupted. I also keep cross-platform copies of the restic binary and the password software installer on my break-glass backups.

1 Like

Interesting idea, thanks. Those files are tiny, I doubt I’d ever need them but no harm having them either.

My restic password is handled by basically specifying RESTIC_PASSWORD_COMMAND="secret-tool lookup Path myresbkp" (I use keepassxc to store all passwords, and keepassxc serves as the system “keyring”).

That KDBX file is continuously synchronised to all my phones and other laptops.

I would suggest that something like this is almost necessary to sanely handle restic passwords.

I also do a restore test using restic dump and compare the tar file to the hard disk; I’ve found restic dump latest / | tar -df - . to be the most painless way to compare the latest snapshot without actually doing a restore, but with the confidence that a restore would work just as well. At the moment this is manual, about once a week, but I could easily automate it and have the results sent to me via ntfy.sh or email or whatever if there are too many errors.

(This method also has the advantages that it won’t check files you excluded from backup. A mount then diff -qr will get spurious errors there).

2 Likes

Interesting idea, thanks. I only have three or four restic passwords, I don’t think I want to go to the trouble of automating it, but for anyone with more passwords it could be handy.

I didn’t know about the dump command, that could be handy for automation.

Oh I have only one restic password :slight_smile:

I don’t think of what I do as as automating it; after all I could have used “RESTIC_PASSWORD” directly in the script if automation was the objective.

I think of it as making sure the password is treated the way all passwords need to be treated – properly cared for in their own secure ecosystem :slight_smile:

The original problem you encountered – forgetting the password – can happen to any of us (we’re all human!) so this mitigates it. (And the keepassxc master passphrase gets used often enough that I won’t forget that!)

1 Like

Thanks for the thoughts :slight_smile:

I use different passwords for each repo, that way if a password is lost the attacker can’t easily get into my other repos. Of course no-one is interested in my data, but it’s a consideration in my day job so I carry it over to my personal data.

My passwords are in my password safe, which I open regularly. The difference is I have to put it in manually. Not quite as good, but easier for me.

Another approach I do here:

  1. Have a backup server that does nothing else and can only be accessed by ssh from behind firewall/nat by ssh key and rest-server with authentication in append-only mode.
  2. Put the repo password in .bashrc (export RES…)
  3. From the backup server, using cron, open a reverse ssh tunnel to the machine to be backupped and remotely execute restic to use the local tunnel port as backup target passing the restic key via command line (like so: ssh -R 1337:127.0.0.1:8000 user@backupee "restic -r rest:http://user:pass@localhost:1337/restic-repo-path backup /path/to/be/backed/up --no-scan --password-command='echo $RESTIC_PASSWORD'")

This way the repo password is never stored on the machine itself and the backup server is not even directly accessable. And if someone hacks the backup server you have a bigger problem anyway. And yes http is not safe but it’s an encrypted ssh tunnel.

Until restic implements backups triggered from the server, this appears to be a way to simulate this. What do you think?

1 Like

Another solution I use for my regular backups:

  1. Dedicated cloud backup server running rest-server
  2. Every host and repos on the host use individual restic repository password and backup data to the server
  3. After adding new repo I manually add second common key/password for created repo on the server
  4. Run scheduled forget, check and report tasks for every repo on the server using common repo password
  5. Scheduled rsync all repos from backup server to home NAS

Some interesting approaches there, thanks nicnab and Ilya :slight_smile: Having a dedicated server would be a bit much for me, not necessary, given I keep my data in various locations including S3.

Something else I do on top of the restic backups is to have a git repository on a local server (not accessible from outside).

In this git repo I store all my various scripts, configurations, and ansible playbooks that I use on all my servers.
Passwords and tokens are encrypted for extra security.

That gives me a central point of reference. On top of that, git gives me a history of all the changes, if I ever need to check an older script or configuration :+1:

1 Like

I think I may have a git repo for some of it, I’ll have to check. I’ll have to put the whole config into git and make sure my PC has a copy, won’t take long. Good suggestions, thanks :slight_smile:

A small nit-pick on the SSH reverse tunnel approach:
Store the password on the client like you would do for “normal” clients too, so the server doesn’t know the client’s encryption password and stores the encrypted backup at the same time. The client, on the other side, can’t use the encryption password for anything unless the backup server is connected to it via SSH.

Attack-surface-wise on the client it doesn’t make much difference whether the encryption password is given as an environment variable or as a file to the process.
Edit: environment variable injected into the client’s process through the SSH tunnel from the backup server. If an attacker can dump the password from the environment of this process, my guess would be they could also read the password from a plain file on the client.

Overall, this is a really great approach, I use this across several machines successfully. It can also be used for other commands, like restic prune and restic stats.

1 Like

Thanks for your feedback! Nice to know I’m not the only one doing it this way :smile:

Regarding the password: not sure which is client and server. To me, server is the machine where the repo is stored and that should not be compromized under any circumstances. That is why I store the passwords only there and pass them via the command at runtime only.

I have identified the problem of access to the repo password long ago.

I have a folder on my laptop containing the bunch of scripts I use to automate my backups, which include the environment variable files with the passwords.

What I do is keep a copy of the entire folder (it’s really small, about 10kb) on several (encrypted) media - a flash drive that’s permanently in my travel bag, an SD card that’s in my safe, a micro SD that’s with a friend. The password to these physical media is something I will not easily forget.

If I lose all my hardware, I have not only the password but the scripts I need to bootstrap myself on new hardware.

1 Like

I’ve seen some configurations using a hardware TPM based password through systemd (Credentials)

I haven’t tried myself (yet) but this is kind-of interesting. If an attacker has full control of the client it’s not going to change anything. But if they manage to steal the password somehow they wouldn’t be able to use it from another machine. I’m not sure it’s very useful but… why not

That depends on the level of access on the system. Once an attacker has enough privileges to modify systemd jobs, then it’s also trivial to intercept the TPM based password when it gets passed to restic.