Anyone using Restic with B2 (Backblaze) and object locks

vsuontam · August 23, 2021, 2:33pm

Hi,

is anyone using B2 with object locks? Ie files are supposed to be immutable in the bucket for at least the set number of days. In my tests, I had object locks for two days but, in reality, I would prob want the locks for a week or so (which would be my snapshotting interval).

I did few tries with Restic, but seems like Restic is expecting to create/delete locks and --no-lock not really working at least with the current 0.12.1. Lots of retries trying to delete locks and repo stayed locked and not all operations work, ie forget doesnt work. (I created an issue to github: Restic forget fails with B2 and object locks despite --no-lock (and a crash in forget when remote lock fails) · Issue #3491 · restic/restic · GitHub)

I wonder if anyone has really got restic working with B2 and object locks? And how did you set it up?

Also wondering does taking a snapshot (“reset”) the time on the older objects? I am considering the scenario that attacker or a malicious user would get access and try delete all the snapshots (they can): If they just delete all the objects in the repo they can (with the access information they have in the host), would the newest snapshot still be valid thanks to object locks that are enfoced on (b2) server side.

Thanks for all the insights!

thedaveCA · October 22, 2021, 12:50am

My own solution to this problem predates B2’s object locks, and maybe there is a better way now?

If restic were capable of detecting what files contain blobs that are being used by the current snapshot during snapshot creation and resetting the lock on all of the needed files, it could help address this problem, but I’m not aware of any way to do that. As long as each file’s lock is set to the latest of any snapshot needing content from that file, this would be safe.

It might be possible to pull this data into json and parse it, setting the locks yourself, but I’m not brave enough to try.

Also, this would make pruning far more complex, I think the best approach would be to repack still-needed blobs into new blobs when pruning, then find all the snapshots that use any blob in the newly repacked file set the lock to that file based on the latest of each of the snapshots’ retention policies (noting that the original file still remains for the duration of its lock, but by creating a new file we can stop extending the lock on the old file). This would result in duplicate blobs existing in the archive, but as far as I know this would be harmless, if inefficient.

This also means that each snapshot would need a lock policy associated with it so that when repacking blobs we can figure out how long each blob is still needed.

In any case, here is my solution which does not rely on third-party enforced locks and therefore could work with any backend:

restic repository is located on my NAS and only I have access to the repository.

This is not actually essential, but since you need the ability to read an entire repository to write into a repository, it is more practical for me to start from a basis of “all data to be backed up must be placed on the NAS”, or accessible to the machine running restic. I’ll get this containerized soon enough.

restic periodically forgets and prunes as needed.
A physically separate machine has read-only access to the repository on the NAS, and write access to B2. It periodically does a rclone sync --check-first --delete-after --immutable from the local repository to B2.

The goal is to ensure rclone builds a complete set of files before it starts transferring, will never modify an existing file, and uploads all of the current files before any deletes, ensuring that a repacked file will be uploaded to B2 before the old file is deleted, so a failure at any moment is completely recoverable.

Ideally the rclone machine will have as little in common with the restic machine and NAS as possible. It absolutely cannot share any credentials or SSH keys, probably should be a different OS. You must not store your keys to this machine on your workstation and it must use a unique password.

rclone is not using the --b2-hard-delete flag, so the pruned files are still technically on B2 until a B2 lifecycle setting finally removes them some calendar days later. Keep in mind that technically this is not a lock, I could login and change the lifecycle policy, or rclone could login and delete everything using --b2-hard-delete.

The goal here is to ensure that no single machine has write or delete access to the NAS repository and B2, such that if the NAS is compromised, rclone will happily sync the changes to B2, but the lifecycle policy ensures the files remain.

If the rclone machine and/or B2 account is compromised, it can’t harm the repository on the NAS.

For anyone not familiar with B2, the default/soft delete is really just a “hide” instruction, and the lifecycle policy says to delete the file ‘x’ days after it is hidden. I don’t think there is a way to configure a B2 API key to allow soft deletes but not hard deletes, although this would be ideal for the rclone machine.

The underlying assumption here is that if there is a catastrophic loss or ransomware attack, I will notice and have time to disable B2’s lifecycle rules.

To give myself a bit more time, I schedule my restic forget/prune to run infrequently, and the rclone machine only does a copy -immutable regularly, with the sync operation only happening several days after the prune, giving me an extra time window in which an unintentional forget/prune operation can be discovered and repaired, effectively duplicating the life-cycle policy.

The remaining threat is a randomware attacker will read this post, figure out what I did, and kidnap me for the duration of the lifecycle after installing ransomware. I am willing to accept this risk as a $5 wrench attack is more practical anyway. If I were a large organization I would have separate individuals with different roles, but I’m just a datahoarding nerd.

I’m open to feedback if I have missed any threats.

MichaelEischer · October 23, 2021, 7:33pm

That functionality is sort of implemented by object locks.

The overall setup looks reasonable. From an attack perspective it is important to ensure that both the access to the NAS and the rclone machine are independent. That is compromising the credentials for one machine shouldn’t also compromise the other one.

It would probably be much simpler to just run prune from time to time, then extend the locks of all still existing files afterwards and after some time delete hidden files once their lock has expired.

thedaveCA · October 23, 2021, 8:40pm

Maybe I missed how to set it up, but I poked at it with B2, and it didn’t seem to allow soft deletes either, the files just remain in place until the lock expires and only then could I initiate a deletion action (or set a policy to automatically delete of course).

Even if there is a way to do hide-as-deleted now, this doesn’t really solve the problem in the case of restic because the set-at-creation object lock will expire on old important data while also making me pay for those hundred GB I accidentally uploaded for the next few months. For snapshot based backup software the clock needs to start when the delete command is issued, not when a file is uploaded.

In a compliance situation, locked-from-creation is perfect, so I understand the implementation (and use similar for email, for example).

I would like to take this a step further and only physically access the rclone machine from its own console (no SSH), which protects me from a keylogger on my workstation.

And no, I don’t need to be this paranoid, nor am I expecting any sort of active attacker and ransomware would never be clever enough to watch for a SSH password and try it on an intermediary rclone box. But I have insurance on my physical belongings, this will cost me a fraction of what I pay for insurance, so… I’m having fun, and developing skills that will be useful elsewhere.

You could, but this would be very limiting I think. Or at least, I currently use a “throw everything into one restic repository” model that would need to be redesigned to group into buckets of expected retention. Even then though, I still think this would be quite limiting.

There are often types of data that are a liability to retain, so you want to get rid of them as soon as possible, but also have a legal requirement (or other incentive) to retain for a period of time. This is extremely common in the medical and legal worlds, but probably applies even to your personal banking and tax records (depending on where you live, of course) in that you absolutely must retain them for a period of time, but once that period expires it may be advantageous to get rid of them (data that doesn’t exist can’t be subpoenaed, and can’t be used to bolster an identity theft attempt).

Outside of compliance situations, I want to be able to delete something, but I want to be able to fix a screwup too, which takes me back to retention/lock starting from the moment of deletion rather than creation.

And to be clear, restic is not the right tool for all scenarios, I’m not a hospital or a law firm, I don’t have much data that is a genuine liability beyond a few dollars a month (and cloud storage is so cheap that an extra TB here or there isn’t actually a problem), so for my purposes I don’t think cloud storage locks as they exist today are a useful part of my strategy.

MichaelEischer · October 24, 2021, 11:42am

From what I understand you’re able to hide locked files, that’s also how it works on S3. In issue #3491 using hide instead of delete seems to work.

You’re right that would be the optimal solution. But it won’t make much of a difference for accidentally uploaded data. The usual solution for the file locks is to use a “short” timeframe of a few weeks/months and extend the lock if the files are still necessary. That way you need a few extra API calls but are able to cleanup unused data soon.

Using a delayed delete operation also can’t guarantee that data is kept for a minimum retention period unless the deletion delay is larger than the retention period. If you use a shorter period then you have to rely on noticing soon enough when someone has tampered with/deleted some files. (The detection would mostly be handled by prune for both the file lock / delayed delete setting)