Ransomware protection

aur3l14no · October 3, 2024, 8:00am

Could you explain these questions for me please?

How do we exactly know which files are really in use, when a) the index are modified, and b) files are deleted by threat actor.
With retention, yes, older versions are kept. But how could you tell if things went wrong, if the source of truth is the repo itself, and that repo can be fully manipulated (read+write+delete).

flomp · October 3, 2024, 2:32pm

No, it’s not targeted at lock files only. However, the reason for this rule are mainly the lock files. (And yes, it should be possible to limit the rule to lock files only)

Currently, when I run a forget/prune on my trusted machine, the files won’t be deleted immediately. Instead, a delete marker would be created and the files will be deleted 30 days later. So I also have a last resort “undelete” possibility, if I make a mistake.

You are right, if there are multiple versions, this is an indication of someone trying to tamper. But… Interestingly, I now had 2 cases where I had 2 versions of the same object and it was not some evil attacker. (I am running this setup for 2-3 years now, and have >10 repositories with about 2 TiB of data.)

Both times, the two versions where binary identical. My guess is that the following happened:

restic uploads the object data
the final “ACK” from the S3 server goes missing - maybe because of a longer interruption of the internet connection
restic is still running and retrying the upload from start
a second object with the same name and same content is stored

I don’t know if the S3 server would also store the data, if the upload was interrupted in-between (I hope not). If yes, then it could even happen that there a 2 different versions of the same object. However, those object should both be pretty new.

If you get 2 versions of an object and one of them is older than your last check of the repo, then it really looks like you are in trouble. (Either because of an attacker or because of unreliable S3 storage)

This is the lifecycle rule I use:

{
    'Rules': [
        {
            'Expiration': {
                'ExpiredObjectDeleteMarker': true
            },
            'ID': 'delete-after-30days',
            'Filter': {},
            'Status': 'Enabled',
            'NoncurrentVersionExpiration': {
                'NoncurrentDays': 30
            },
        },
    ]
}

kapitainsky · October 3, 2024, 7:01pm

All files in your repo are “in use” - they belong to you repository and contain various bits of information needed to restore original data.

One easy way to identify unwanted changes would be to maintain a file with all files listing including their hashes - signed itself with PGP key but overall this is totally different subject. Probably book can be written about it:) S3 lock is only mechanism which allows you to have access to repository at different moments of time - back up to the duration of defined lock. More details here. Its strengths comes from server side built in mechanism which does not allow any API to modify historical objects or truly delete anything. With S3 providers like AWS to delete such protected data only total account termination can achieve it.

Think about it as a sort of “time machine”. If your pocket holding wallet is protected by such mechanism when your wallet is lost or its precious content swapped with cut paper (here we have the same problem how do you know?) you can travel in time to let’s say 3 days ago and retrieve what was there. The same is with repo stored in cloud supporting objects lock.

In order for this to work there are three pre-requisites required:

New objects are created with lock set
Existing objects lock duration has to be extended periodically. Otherwise various repo objects created in the past would lose protection eventually. It means that your repo is protected for lock duration time period counting from the last objects lock extension.
To restore anything you need mechanism for your software to operate not on current objects but on objects state at specific date and time.

Restic does not support any of this out of the box. Point (1) can be achieved by storage configuration - objects can be created with lock enabled.

(2) and (3) require additional tools. For example it can be DIY using aws s3api put-object-retention to extend objects lock duration and rclone --s3-version-at to expose repository state from the past.

Only FOSS backup software, I am aware of, fully supporting objects locking is kopia. Rusitc has features making implementing it easier (and it is WIP so should get more in the future). But old good restic has nothing and requires creative DIY to get it working. It is not rocket science as soon as you get your head around all nitty gritty details. It has been discussed extensively in the past on this forum if you are interested.

aur3l14no · October 4, 2024, 3:10am

@kapitainsky Hey, thanks for clearing up my confusion. I just want to make sure we’re on the same page.

Suppose a threat actor uses forget and prune to delete all snapshots. The repository would still be valid, and I assume the checksum would pass. Then, during the next “extending retention” procedure, all files would be considered properly deleted, and no extensions would take place.

If this is correct, the key issue becomes “how do you know?” If you can be alerted within the retention window, recovery is possible. Otherwise, it’s too late.

Am I getting it right?

aur3l14no · October 4, 2024, 3:28am

That’s definitely interesting! It makes me curious about how the “append-only” mode (where overwriting is disabled) handles cases of broken uploads.

Overall, while I like this solution, I still can’t feel completely confident about it. A truly ransomware-resistant backup solution needs more testing by a larger user base to prove its reliability.

I think it’s a matter of doing it right or not doing it at all. At this point, I’m not sure the solutions are well-thought enough (I myself can’t think of any big issues though) to be worthy of getting into the troubles implementing it.

kapitainsky · October 4, 2024, 4:23am

It is indeed important. Primitive attack deleting all files from repo can be easily detected (repo will stop working`), similarly encrypting all your data at source (but even this assumes you keep an active eye on it). However if you face very determined and sophisticated attacker changing only your source files slowly and not corrupting repo at all then it is very different game. As I said subject of noticing “attack” has nothing to do with objects lock which gives you only mechanism to recover past repo state. If you face such attacker than basic advice would be to make sure that you have multiple backups, some offline/offside and never overwritten nor deleted. Use your data constantly to detect any anomalies and periodically restore backups and validate restored data against some trusted list (PGP signed list of hashes is the simplest solution coming to my mind).

Think about how long it would take you to notice attack worst case scenario and set object lock accordingly. Nothing stops you from setting it to 1 or 10 years. In some regulated industries (medical, legal etc.) it is nothing uncommon.

Of course you will pay for it:) So also be realistic who your adversary is - script kiddy or Mossad doing Mossad things. Protection depending on threat level will have very different cost and in some situation you might rather need magical amulet:)

flomp · October 4, 2024, 8:51am

After further thinking about it, I am quite sure that partial uploads should not happen. Restic sends the MD5 of the payload in the HTTP header. Therefore, the S3 Server will know that the data is incomplete and must not store it.

Well, you can be part of that user base. Don’t hesitate

For the really critical machines, I do one backup using Veeam to a SOBR using Object Lock and one backup using restic.

Also, I would be really happy, if Object Lock support is implemented in restic and I think I might switch quickly. However, as already discussed, this is a larger implementation task. And even then, it will need some monitoring and careful setup. For example to make sure that the locks are really being refreshed, you should do this on a trusted machine.