Pruning rewrite snapshots

I have a bunch of snapshots with the tag rewrite. Initially, I thought “ah, those snapshots are rewritten and have superseded some other one”.

However, I now noticed that these are snapshots that are rewritten and superseded by some other one.
It seems to me that I am “carrying around things” that I don’t need.

However, how can I tell if it is safe to remove them (and the underlying data ofc)?
There have been a number of check / check --read-data that have ran against them through the course of those snapshots living (I guess maybe even a year).

… However, I don’t know how rewrite and check / check --read-data: Does it ignore them completely - both snapshots and “data unique to them”, or it is possible that data pruned from the rewrite snapshots might be referenced by non-rewrite snapshots?

Disclaimer: I am confusing git with restic sometimes, some of my “misunderstanding” might be stemming from that.

The snapshots with the rewrite tag are those that a newly created by the rewrite command. The original snapshots are not modified by the rewrite command (--forget deletes them though).

rewrite just works on the snapshots you tell it to process, the rewrite tag has no special meaning there. check simply processes all snapshots in the repository; tags are irrelevant for it.

Oh. So, it’s actually the opposite? :sweat: I need to delete the non-rewrite snapshots to remove the files that rewrite “forgot”?

So, is there a possibility that pruning the old data will leave the rest with “missing references”?

Exactly. It’s also rather simple to verify experimentally: Just create a small snapshot and use rewrite to drop some files. Then take a look at the old and new snapshot using ls.

No. prune never deletes data that is still in use by any snapshot. Unlike traditional backup software, each snapshot contains references to all files. As restic deduplicates the stored data, this still yields rather compact snapshots.

yeah - maybe I’ll try to compare “some” snapshots to understand which one is “the latest”.

… Since I thought everything in snapshot is immutable, including attached tags (therefore, the old snapshots CANNOT be added tags, since that will change their sha1).

But thank you for your verbose explanation :upside_down_face:

You’re right it is not possible to modify snapshots without changing their snapshot ID. restic tag creates new snapshots and deletes the old ones.

I am considering using rewrite to unbackup some data which was both pointless to backup and which is space-consuming (and is already compressed). Whilst I currently estimate there are only perhaps a dozen snapshots involved (in a 1.5TB repo containing over 100 snapshots), I am bothered there is no clear mapping between the rewrite-tagged rewritten (UPD) snapshot and the original (ORIG) snapshot.

Please assume I do not use rewrite --forget; and that more than one snapshot is rewritten (probably via an --exclude … search or similar)).

So, going forward, perhaps the rewrite tag have a value of ORIG, i.e., rewrite=ORIG?

This would seem to allow easier script and/or manual checks of the situation, and scripted and/or manual later forget' of the presumably now-unwanted *ORIG`* snapshot.

Yes, after a forget ORIG there would still be an rewrite=ORIG tag on rewritten snapshot UPD, but… So What? It seems Mostly Harmless, and yet again, should be able to easily adjust manually or using a script.

Edit: Improved typography for easier reading.