Is storing key in the backup location really safe?

capcoding · August 21, 2019, 3:29am

newbie to restic, only concern is how secure when you store a hashed key in remote backup location along with all the data, is it possible to keep the key local instead(e.g. use the key locally and encrypt data, then send it to backup location)?

while scrypt is fairly safe, is it possible for people get the hash then crack it on a different more powerful machine and totally bypass the scrypt path? e.g. copy /etc/shadow to another machine and crack the password there(e.g. some bitcoin machines are good at those)

fd0 · August 21, 2019, 8:23am

Good question! I have answered this a few times already, but I keep forgetting where I wrote it down. So I’ll just answer it here again in a bit more general and simplified form so I can point people to it

The question is: Is restic’s practice of storing “key files” in the repository secure?

Background

Almost all data in a restic repository is encrypted with a master key. The master key is chosen randomly when the repository is initialized. The password entered at initialization time is used (together with the Key Derivation Function scrypt) to derive a key for that password. The master key is then encrypted for the key derived from the password, the encrypted master key (together with some other data needed for scrypt) is saved to a file in the keys/ subdir in the repo.

The construction is very similar to other solutions use (e.g. the Linux Unified Key Setup (LUKS) used for disk encryption on Linux).

You can see a sample key file in the repository documentation:

{
    "hostname": "kasimir",
    "username": "fd0"
    "kdf": "scrypt",
    "N": 65536,
    "r": 8,
    "p": 1,
    "created": "2015-01-02T18:10:13.48307196+01:00",
    "data": "tGwYeKoM0C4j4/9DFrVEmMGAldvEn/+iKC3te/QE/6ox/V4qz58FUOgMa0Bb1cIJ6asrypCx/Ti/pRXCPHLDkIJbNYd2ybC+fLhFIJVLCvkMS+trdywsUkglUbTbi+7+Ldsul5jpAj9vTZ25ajDc+4FKtWEcCWL5ICAOoTAxnPgT+Lh8ByGQBH6KbdWabqamLzTRWxePFoYuxa7yXgmj9A==",
    "salt": "uW4fEI1+IOzj7ED9mVor+yTSJFd68DGlGOeLgJELYsTU5ikhG/83/+jGd4KKAaQdSrsfzrdOhAMftTSih5Ux6w==",
}

This JSON document is stored in the repository as it is. The field data contains the encrypted master key, the other fields are either meta data (like hostname) and there’s an issue to remove them.

When a second password is added, the master key is decrypted with the existing password and then encrypted again with the key derived from the new password in a new key file.

When restic is run and the user supplies a password, restic downloads all the files in the keys subdir in the repository. For each file it then derives the password key by running scrypt with the password and the parameters from the file, and then tries to decrypt the data field. If that works, the password is correct and restic can decrypt all other content stored in the repository with the master key.

Analysis

So, let’s analyse this construction from the perspective of attackers who gained access to the files in the repository (say, by compromising the server the restic repository is stored on via sftp), but they don’t have a valid password for the repo.

They can now read all files stored in the repo, most of which are encrypted. The only unencrypted files are the files in the keys/ directory. Let’s say this particular repo has two passwords, therefore there are two files stored in keys/.

The attackers can see the meta data fields (that’s why we plan to remove them), who created which password on what host. And they can read all the data used to run scrypt with. The KDF scrypt does not only require a lot of CPU for each run, it is also designed to need a lot of memory.

For restic, we’ve configured that each run needs at least 60 MiB of RAM and about 500ms time (see here). This means that if attackers have a machine that has the same CPU power as the host restic runs on, they can try two passwords per second per core.

If attackers use a machine with a GPU, they are still limited by the memory on the GPU. Let’s say the GPU has 16GiB of memory, then they can still only test ~273 passwords in parallel (since each needs 60MiB) of memory. That number’s quite low.

I’ve tried to run hashcat (a program typically used to break password hashes on GPUs) with the scrypt parameters mentioned above on a machine with two NVidia GPUs, but it failed to even start computing hashes. I’m not sure what went wrong.

If you have a sufficiently long password (say, 16 characters), I think it’s not realistic to find the password even with a high-powered GPU-based cracking machine.

Trade-Offs

The design of the repository format and restic itself contains some trade-offs.

While it would be possible to store the key file not in the repository, but only on the local machine, but we decided against that to improve robustness. The risk that attackers are able to find the password needs to be balanced against users losing access to their backups because the key file is only kept locally and the SSD failed, then everything is lost.

We also try to keep restic’s complexity under control, so it does currently offer keeping the key file local as an option.

I hope this answers your question!

capcoding · August 21, 2019, 12:27pm

Thanks for the quick reply. It helps a lot.

Two remaining questions:

when I init the repo(or adding more passwords later), all the key-generation, decrypt/encrypt happen at local machine correct? do they ever run on the remote machine(so others can steal it from memory if they really want it).
while the hash is derived from scrypt, say after I steal those key files, do I have to use scrypt algorithm to crack them? maybe there is a faster way to crack them without using scrypt as they’re just a static file for me now?

keeping keys local is the true secure way I feel, there are multiple ways to secure local password these days (wallet, usb-key,etc), without a key on the repo, I will never need worry about its safety, but local key might make de-duplication complicated though I don’t really know much there.

fd0 · August 21, 2019, 12:53pm

Correct, all cryptography happens on the local machine, the unencrypted keys never leave the system restic runs on.

No, only locally.

There’s no “hash” here, you mean the encrypted master key in a key file?

After you steal a key file, you can try to use scrypt and e.g. a word list to find the correct password. There’s no way a round it, you must use scrypt, as this is the only way to get the key which allows decrypting the master key.

FYI: The situation that attackers have access to the files in the repository (e.g. because the files are saved on a shared server somewhere) is explicitly mentioned in the threat model, this is something restic is built to protect against.

I hope this answers your qeustions!

capcoding · August 21, 2019, 1:17pm

By the way I don’t think any system can defend file-deletion, unless it’s a distributed solution, where a single point of failure is not a concern. You can have multiple restic repos at different locations to avoid someone delete files though.

Last question, if someone gets access to the repo and delete a few files(contents, or the key files, whatever), the only way to save the day is to upload all local files correct? will restic detect remote snapshot is now in a bad state and restart uploading all local files from scratch automatically?

will it give a warning first and let the user decide when to proceed?

Basically, if the repo is damaged, it should not impact my local files ever, and the only fix is probably to upload everything from scratch or the algorithm is smart to figure out what was missing and just ‘rsync’ those missing parts?

sebastien.gross · August 21, 2019, 1:26pm

Hi,

An other source of information is https://www.tarsnap.com/scrypt/scrypt.pdf where brute forcing cost is not estimated in time but in money.

If the password is wisely chosen the cost (in $) to crack it within a year is so expensive that it is not worth.

Let’s say you generate overkilled passwords with pwgen -y 80 there is almost no way to crack it. Of course this assumes there is not flaw in scrypt function.