Find snapshot whose removal would return the most disk space

wrohdewald · August 19, 2022, 6:32am

I need disk space and am willing to sacrifice some snapshots. So I want to know which old snapshots hold the biggest number of unique references to blobs. Can I use restic from the command line to list the number of such unique references per snapshot?

alexweiss · August 19, 2022, 7:28am

The easiset way is to run

restic forget --prune --max-unused 0 -n <SNAPSHOT>

for each of you snapshots. It shows how much blobs and space can be removed by removing and pruning a given snapshot.

But note that you could be in a situation where most blobs are used by multiple snapshots. In that case removing a single snapshot does not help much, but maybe deleting two/three/some snapshots could give a huge decrease of space usage…

alexweiss · August 19, 2022, 7:42am

Another remark (but I don’t know if it applies to your case):

With “traditional” backups - I mean classical full or incremental backups - it was often a strategy to look for snapshots which one could spare and remove them when being low on space. The idea was that the data is saved duplicated in the backup and by removing a “not so important” snapshot you always could get the important data by restoring from a previous or later snapshot.

Now with deduplicating backups, like restic, this strategy does not work at all. The reason is, of course, that the data is already deduplicated. So, if you have a “not so important” snapshot which means that the important data is contained in other snapshots, this simply means that the important data does not use any extra space within this “not so important” snapshot - and removing the snapshot hence does not free space!

Simply said: As the data is stored deduplicated, if you want to free space the way to go is not generally not to remove snapshots, but to really remove data - from all snapshots!. This is why people are asking for ways to rewrite snapshots (i.e. remove data within all snapshots, but still keep the modified snapshots).

MichaelEischer · August 19, 2022, 3:50pm

To clarify. -n is short for --dry-run. Make sure you don’t forget that parameter as otherwise restic will delete the snapshot.

wrohdewald · August 20, 2022, 6:52pm

That does what I want, thanks! Support for --json would make it easier to use it in scripts.

With “unique reference” I meant exactly what you both explained. If two snapshots reference a blob, the reference is not unique. Is there a better way to phrase this clearly in English?

On second thought - a snapshot might reference the same blob multiple times, so my question was wrong anyway.