My own solution to this problem predates B2âs object locks, and maybe there is a better way now?
If restic were capable of detecting what files contain blobs that are being used by the current snapshot during snapshot creation and resetting the lock on all of the needed files, it could help address this problem, but Iâm not aware of any way to do that. As long as each fileâs lock is set to the latest of any snapshot needing content from that file, this would be safe.
It might be possible to pull this data into json and parse it, setting the locks yourself, but Iâm not brave enough to try.
Also, this would make pruning far more complex, I think the best approach would be to repack still-needed blobs into new blobs when pruning, then find all the snapshots that use any blob in the newly repacked file set the lock to that file based on the latest of each of the snapshotsâ retention policies (noting that the original file still remains for the duration of its lock, but by creating a new file we can stop extending the lock on the old file). This would result in duplicate blobs existing in the archive, but as far as I know this would be harmless, if inefficient.
This also means that each snapshot would need a lock policy associated with it so that when repacking blobs we can figure out how long each blob is still needed.
In any case, here is my solution which does not rely on third-party enforced locks and therefore could work with any backend:
- restic repository is located on my NAS and only I have access to the repository.
This is not actually essential, but since you need the ability to read an entire repository to write into a repository, it is more practical for me to start from a basis of âall data to be backed up must be placed on the NASâ, or accessible to the machine running restic. Iâll get this containerized soon enough.
-
restic periodically forgets and prunes as needed.
-
A physically separate machine has read-only access to the repository on the NAS, and write access to B2. It periodically does a rclone sync --check-first --delete-after --immutable
from the local repository to B2.
The goal is to ensure rclone builds a complete set of files before it starts transferring, will never modify an existing file, and uploads all of the current files before any deletes, ensuring that a repacked file will be uploaded to B2 before the old file is deleted, so a failure at any moment is completely recoverable.
Ideally the rclone machine will have as little in common with the restic machine and NAS as possible. It absolutely cannot share any credentials or SSH keys, probably should be a different OS. You must not store your keys to this machine on your workstation and it must use a unique password.
rclone is not using the --b2-hard-delete
flag, so the pruned files are still technically on B2 until a B2 lifecycle setting finally removes them some calendar days later. Keep in mind that technically this is not a lock, I could login and change the lifecycle policy, or rclone could login and delete everything using --b2-hard-delete
.
The goal here is to ensure that no single machine has write or delete access to the NAS repository and B2, such that if the NAS is compromised, rclone will happily sync the changes to B2, but the lifecycle policy ensures the files remain.
If the rclone machine and/or B2 account is compromised, it canât harm the repository on the NAS.
For anyone not familiar with B2, the default/soft delete is really just a âhideâ instruction, and the lifecycle policy says to delete the file âxâ days after it is hidden. I donât think there is a way to configure a B2 API key to allow soft deletes but not hard deletes, although this would be ideal for the rclone
machine.
The underlying assumption here is that if there is a catastrophic loss or ransomware attack, I will notice and have time to disable B2âs lifecycle rules.
To give myself a bit more time, I schedule my restic forget/prune to run infrequently, and the rclone machine only does a copy -immutable
regularly, with the sync
operation only happening several days after the prune, giving me an extra time window in which an unintentional forget/prune operation can be discovered and repaired, effectively duplicating the life-cycle policy.
The remaining threat is a randomware attacker will read this post, figure out what I did, and kidnap me for the duration of the lifecycle after installing ransomware. I am willing to accept this risk as a $5 wrench attack is more practical anyway. If I were a large organization I would have separate individuals with different roles, but Iâm just a datahoarding nerd.
Iâm open to feedback if I have missed any threats.