If I understand correctly, the status quo is that clients can insert arbitrarily backdated snapshots, which means compromised clients can delete backups from append-only servers by causing normal restic forget runs to delete real backups in favor of bogus ones, and the currently recommended remedy is to use --keep-within to set a period within which snapshots are never deleted.
This is certainly better than not doing it, but still pretty far from achieving the safety goals of --append-only: before the cutoff, clients can still delete backups, and after the cutoff, you lose snapshot pruning.
I’d guess that a proper solution could get complex, but one “dumb” idea stands out to me: why not just create a server option --max-backdate 1h? Meaning, only accept new snapshots timestamped within one hour of the current time. You would then temporarily disable the option when you do want history editing, and disable it or adjust the window to handle long backups, bad clocks, bad network, whatever.
I don’t get what you’re trying to say. If a client is compromised and creates snapshots with manipulated timestamps, then it’s also safe to assume that the snapshot content is unusable. Thus, there’s no point in keeping those snapshots. Snapshots from not compromised hosts wont’ be affected as forget --keep-within ensures that the snapshots can’t be replaced by a compromised client.
That’s not how the repository format works. The rest-server is unable to decrypt the snapshot content and hence unable to verify the timestamp.
We’re on the same page that the goal is to preserve good snapshots, i.e. those predating the compromise, not anything created by the compromised client.
Let me lay out my understanding, please correct me if any of this is wrong.
Let’s say you back up every hour on the hour, and your normal pruning run is forget --keep-daily 30. The attack vector here is the compromised client inserts 30 new snapshots, with timstamps on each of those 30 days, all one minute later than the good snapshots. This causes your next forget run to delete not the bogus snapshots, but the good snapshots, and all of them, even those from long before the compromise.
If you modify your pruning to forget --keep-daily 30 --keep-within 3d, running that no longer deletes good snapshots within the past three days no matter what the compromised client has done—but it still deletes all good snapshots between -30 and -4 days, and it also results in an extra 3 * 24 = 72 hourly snapshots sitting around at all times. You also have only three days to notice the breach before the compromised client is able to delete all your good snapshots. It’s much better than the situation we had before, but if we didn’t also care about a 30 day history or the number of snapshots then we wouldn’t be keeping historic data or pruning in the first place.
Well, “new guy doesn’t really understand the project” is hardly a surprising outcome here, is it? Maybe there’s a better approach?
Basically we’d need some trusted timestamping source that can attach a timestamp to a snapshot. Depending on the remote storage it might be possible to just use the file metadata for that. Otherwise, we’d need something that can attest that a snapshot was created at a certain timestamp. The latter would definitely result in a massive amount of additional complexity and probably only works for the rest backend. The former is much easier to implement, but might be prone to other problems. This has already been discussed somewhere in the restic / rest-server bug tracker (although I didn’t find the issue the last time I looked). Either way, I currently don’t have time to dive deeper into this.