Using restic to backup encrypted containers?

tholin · January 23, 2018, 4:19pm

I’ve been using the same backup strategy for over a decade now. I loopback mount an encrypted container, rsync some folders to it and finally rsync the container to an offsite host. The host I use is quite expensive so I’ve started looking for other hosts. Unfortunately S3, B2 and similar cloud storage solutions doesn’t support rsync so I need some other setup.

Restic is one option that came up while googling. It’s part of the “holy grail of backups” family. The problem I have with all of them is that they roll their own encryption. I realize that restic’s encryption is better than dm-crypt aes-xts assuming the implementation is sound but I’m hesitant to entrust by data to a new tool. An idea I had is to keep using my encrypted container and use restic to backup the container itself. That should be as safe as my old setup but allows me to use regular cloud storage solutions.

What are the downsides to this setup?

It uses more local space because I need to store the regular data + the container + some restic cache. The container is currently 50Gb so it’s not that much more.
The backup takes more time because restic must read the entire container each backup.
Restic only sees the encrypted data so compression doesn’t work… but restic doesn’t support compression anyway.
It uses more host storage. If I delete files in the container the container itself stays the same size and restic doesn’t know which parts has been deleted.

Any other downsides?

One thing I could do is run fstrim on the container’s filesystem before unmounting it. That will send trim commands down the stack and eventually the trimmed areas will be made sparse in the container file. Restic will then see them as ranges of zeros and can compress them… but restic doesn’t support compression. But it could deduplicate them if they are large enough? The average blob size is 1 MiB and most of the files are a lot smaller so I guess that won’t work so well?

What filesystem would be appropriate to use in the container? Ext4 can be used without the journal so I don’t have to waste space backing up that change each time. Using rsync --inplace on ext4 should really overwrite file in place resulting in smaller diff between snapshots. Is there any other filesystem with a particularly good layout of metadata suitable for this?

What happens if data gets damaged on the host? Damage to the keys and config file means instant loss of all data right? What about the data and index files? If a backed up file is partially corrupt can restic restore the good parts of it and leave the broken parts unwritten? That’s important since I only plan to backup one file.

Is the backed up data always consistent even with power failure, OOM kills, network loss and similar problems?

Which storage provider is “the best”. That means cheapest option that fulfills my needs. I live in Europe and have 10Mbit upload. How sensitive is restic to round trip delays? Backblaze B2 is cheap but it seems like some operations on B2 is so slow it’s easier to sync to a local filesystem and then use rclone. Restic 0.8 Prune still very slow
That would waste another 50Gb of local space. If that is the case I could just as well use borg+rclone since it supports compression. What other unexpected gotchas are there I should be aware of?

To get compression I would have to add that inside the container. One possibility is to use btrfs with compression instead of ext4 but I don’t want to do that because I don’t think btrfs is stable or reliable enough to trust it with my data. But if I don’t trust btrfs should I trust restic? The FAQ in the docs talks about a bug that is so common it needs it’s own FAQ entry but it also says “The cause of this bug is not yet known”. The bug might be benign but it still looks bad. How reliable is restic? Yes, trying to quantify reliability difficult.

fd0 · January 23, 2018, 6:34pm

Hey, welcome to the forum!

You can indeed use restic to backup a container, that’ll work. You won’t get to benefit from the deduplication built into restic so much, but the sparse areas in the container will deduplicate just fine. For a sequence of zeroes, restic will use 512KiB for the block size. Another downside of your approach is that you can’t just use the restic mount command to browse around in the backups, which is a really nice feature!

I’m not sure about the file system recommendations in this setup, sorry.

Excellent question! You’re partly right: Damage to the key files is fatal. The config file is not so important, we can live without it (although in this case you’d need to patch restic, but it’s possible). The index files are just an optimization, they can be rebuilt from the files in data/ with restic rebuild-index from scratch. When a file in data/ is damaged, it depends if it’s a metadata file, all files referenced cannot be restored, but in theory the data is still there. If it’s a data file, parts of files won’t be restorable.

Not quite, restic uses standard crypto (AES256 for encryption and Poly1305-AES for MAC), although the combination is not so common. The implementation is the one from the Go standard library, which is used by many other projects.

At least one cryptographer looked at the crypto in restic (and decided to use it for his personal backups):

You should be skeptical, I can understand that very well. And you’re also right: The crypto in restic is much better than dm-crypt: With block devices, you’re given a 512byte block-based interface, and you need to provide this to the higher layers in the kernel. There’s no way you could store additional data, like an IV or a MAC. So basically when you’re using block-device-based crypto, you give up on authenticating the data you’re decrypting. The only way around that is to implement crypto on the file system layer or higher, like zfs or restic.

For restic, all files in the repo (except the key files) are encrypted and have a MAC, so we can check the data is authentic before attempting to decrypt anything. This eliminates a whole class of vulnerabilities (like Padding Oracles).

If it helps anything, I’m working as a penetration tester, so I get to break systems and crypto for a living. Which does not imply that I’m overly qualified in implementing secure systems, but I have a very good idea what the most relevant vulnerabilities for such systems are and how to prevent them.

We cannot guarantee that, it’s very hard to do (depending on the backend). We’re trying very hard, though. The order in which the files are written to the repo, and the fact that each file in the repo is written exactly once, and then either deleted and/or replaced by a file with a different name, helps with this.

I don’t know if you’ve discovered the design document yet, but it’s here: References — restic 0.16.3 documentation The complete repository is defined there, and it’s rather simple.

That’s very hard to say. From Europe at least, B2 is very slow due to the high latency of at least 800ms until the HTTP response header arrive for each API call. DigitalOcean Spaces is faster, but I’ve gotten reports from several users that the service seems not to be stable.

Do any of the other users in this forum have recommendations?

In terms of file systems: I’ve heard very good things about zfs, did you try that yet?

rawtaz · January 23, 2018, 7:36pm

The best is the one that suits your needs and requirements the most. Personally I prefer my own hosting, and on ZFS.

I’ve had good experiences with Tilaa for VPS, but I’m guessing you mean some object storage provider rather than block storage.

tholin · January 24, 2018, 3:07pm

There shouldn’t be much duplicated data in the container anyway except for unallocated space. I’m more interested in deduplication between snapshots.

That should work. I’ve tested it on a local repository already. I can loopback mount the container from the fuse filesystem. I’ve done the same thing with my current setup over sshfs. It’s very slow but useful in a pinch.

I though the chunker_polynomial was required but now that I think about it it’s only needed to backup new data, not reading old data.

I did some testing on a local repo with a single 1GB file backed up. Looking at the content of the snapshot I got the corresponding tree blob id. That tree blob got a list of all the data blobs for the file in order. Using the index file I figured out which pack had the first data blob of the backed up file. I deleted that pack from data/ and tried to do a restore. Restic tried to re-read the missing file several times which isn’t that useful on a local repo. Eventually it gave up without restoring anything. Same thing happened when I corrupted the same pack file.

Restic had access to the tree blob so it knew which data blobs were needed. It also had the length of all data blobs from the index file so it could figure out the right offsets for the data it still had but it still gave up after the first error.

Loss of the tree blob means loss of chunk order so that’s loss of the entire file.

I’m mostly worried about mistakes in the implementation. Like that security bug in tarsnap. It used AES in counter mode but after some refactoring of the code the counter nonce wasn’t incremented anymore. The developer of tarsnap is a genius but he can still make mistakes. If there are few contributors those mistakes might not be noticed right away.

Is that only a problem with prune or all operations? Adding new snapshots should only add new files. Those files might end up truncated but on the next run restic could check that old files are fine so the new snapshot won’t reference the old corrupt data.

Perhaps it’s better to ask which one you use (if any). The setup used by the developer is usually the best tested one.

Zfs itself might be good but I’m using Linux so that means zfsonlinux. That wasn’t so great last time I checked. It’s an out of tree module so you constantly have to rebuild it when the kernel change. Last time I checked it didn’t have trim support which would be useful if I want to use it in the container.

tholin · January 24, 2018, 3:10pm

I’ve considered getting a VPS but they are probably more expensive than dump object storage and I don’t want to manage a VPS install. I have too many boxes to manage already.