Hello all,
I’ve been working to identify a good backup strategy for a while now and have identified restic as the best bet for achieving it. Unfortunately, implementing a secure repository isn’t something restic handles. Using restic-server would help, but only supports storing repositories on the local host.
Using rclone’s restic REST API implementation, I believe we can achieve off-site repositories that are secure against host compromise. Below is a design that uses rclone to implement this idea. As I’m sure I’ve missed something, please provide comments.
restic Secure Backup Strategy
To achieve a, somewhat, desirable backup strategy that prevents a host compromise from also compromising backups, we’ll implement a multi-repository backup in which one repository is completely controlled by the host (local) and the other by a central host (off-site). The central host will be responsible for the maintenance of the off-site backups of all hosts.
The Problem
The purpose of backing up data is to protect it against loss. In this case, loss can be defined as the inaccessibility of the desired data, not the undesired exposure of it. Such inaccessibility can occur as a result of hardware failure, software failure, user error, or malicious intent.
restic covers the simple case of hardware failure, software failure, and user error well by maintaining a versioned repository of critical data in an off-host location. In the event that something catastrophic happens on or to the host, a user can simply restore the data from the off-host location. All the user needs is restic, access to the repository, and any of the passwords associated with it.
When we consider malicious intent, however, restic provides little protection. restic’s encryption sufficiently protects the repository from access by an unauthorized party. But when it comes to an attacker that wants to destroy or hold data ransom, restic may not help.
As attackers become more sophisticated, they are putting more effort into the thoroughness of their attacks. Even automated attacks have started to consider the presence of backups and attempt to disable or destroy them. In the simple case of a host backing up to a repository, even remote, where it is able to delete or overwrite the contents, an attacker could do the same. restic alone has no ability to prevent this.
Some solutions presented to protect against this risk are to use restic-server in append-only mode or to restrict the S3 permissions of the backup user’s API access keys to prevent deleting files. With restic-server, the challenges are that the repository is stored locally and/or you need to maintain the backup storage. With the S3 API, while you can restrict the ability to delete files, you cannot restrict the ability to overwrite them. To deal with maliciously overwritten files in S3, enabling bucket versioning is often suggested. That may mitigate the risk, but offers an onerous restore process if an attacker does overwrite the repository files in S3.
Regardless, in both cases, an attacker with access to a single host (the host running restic-runner or the host with access keys to the S3 bucket), can make restoration difficult or impossible, by either destroying or corrupting the repository.
Local Backups
Local backups will be performed to either a local disk or a local network device. They will be performed by a scheduler on the local host that will also manage retention and cleanup. Because of the non-monetary cost of I/O to local storage (e.i. no per-API call or data transfer billing) and deduplication behavior of restic, these can be performed as frequently as desired.
Off-site Backups
Off-site backups will be performed through an rclone instance serving the restic REST API. rclone will restrict clients to append-only, protecting the repository from host compromise. Each host will still be responsible for initiating it’s off-site backup (though this may change), but will not handle retention or cleanup. Retention and cleanup will be handled by a separate host that can access the off-site repositories directly.
A second password will be added to every off-site repository that can be used for repository maintenance and data recovery. This will allow each host to use a unique password without needing to track them all.
Justification
If an individual host is compromised, the attacker will have all of the knowledge required to access both repositories. Though the attacker would be able to modify and delete data from the local repository, they could only read and add data to the off-site repository. While the attacker could destroy any local copy of the password, they cannot modify the passwords stored in the off-site repository. Without needing to keep a central copy of each host password, we can still restore data from the off-site backups using the management password.
In the event that the central host, password, or off-site storage is compromised, the attacker would not have access to the local repositories of each host. While the attacker could destroy the off-site repositories, all hosts could still backup and restore from local repositories.
In the event that there is a software issue or hardware issue that causes one of the repositories to become unavailable, the other should remain healthy.
Risks/Concerns
Data Duplication: Fortunately, local storage is fairly inexpensive, but we would be storing backups twice. This is in-line with the 3-2-1 backup strategy, anyway.
Backup Run-Time and Resource Usage: As we’ll be running backup operations twice to independent repositories, the backup time and resource cost is effectively doubled. This is annoying.
Data Security: Each repository, both local and off-site, will have it’s own password and encryption key. However, if the central management is compromised, the attacker will gain the ability to read the backup data of all hosts (via the management password). This could be mitigated by segregating the central management of repositories by data classification and protecting each accordingly.
Application/Protocol Design and Implementation Issues: These happen and can range from data corruption to compromise. One possible scenario is if an attacker were able to acquire the management password from a host by reading the key database of it’s off-site repository, then using a flaw in the encryption to reveal the management password. Since rclone does not provide the ability to restrict which host can access which repository, the attacker would then gain the ability read data from any repository that uses the same management password. This could be mitigated by using separate rclone instances and directories/buckets for each host. While it would limit an attackers range, it requires additional management and overhead.