View list of files to be removed in restic forget --prune --dry-run

If I run this command
restic forget SNAPSHOTID --dry-run --prune

it calculates which blocks affected, and amount of space to be saved.

Is it possible to get a list of the particular files which will be deleted?

1 Like

You asked: “Is it possible to get a list of the particular files which will be deleted?”
The answer is “most likely yes”: but this enhancement does not exist yet. It would be a new option to restic forget, something like --show-removed-files.

Keep in mind once a snapshot gets deleted, blobs might be moved from used status to unused status, but only if there is no other link to this particular blob.

So in order to find out if such a transition is about to happen, all the remaining nodes in the repository have to be searched to find out if no other link exist to this blob. Only then it will be marked unused. This operation can be quite expensive, depending on the number of nodes in the repository. But it is possible, as long as there is a link from the snapshot to the file. Once the link has been broken (forget), the blob might or might not be in use, but you don’t know anymore to which snapshot it once belonged.

How would you like to see this list: JSON or text? How much more detail do you want to see, in addition to the pathname (size, modification date, …)? Bear in mind that it is possible that the pathname is not unique: multiple path can point to the same sets of data blobs (deduplication).

I hope that answers your question

1 Like

Thanks for this comprehensive response. At very least this has told me that I have not missed some clever option!

In terms of what I would like to see - a simple list of files as (say) csv, with columns - snapshot_ID, snapshot_date, file_hash (include this to allow user to easily see de-duped files listed more than once), path_to_file, filesize_in_bytes, file_mod_date

Why was I looking for this? TBH I am relatively new to restic, still getting to grips with how it works, and wanting to be confident it is doing what I think it is doing.

If I were to do a backup by rsync or similar to a backup server I can look at that server and report the number of files, their size - and verify for myself that it is faithful; while restic does not give that, the various “check” options provide necessary assurance. (And I do appreciate that restic does shed-loads more than rsync does).

When your backup mechanism starts deleting stuff however, that is when you want to be sure-sure it is doing the right thing. If using robocopy with its mirror (flag /mir) option and in dry-run mode (flag /l = log only) the logfile gives me the list of “Extra” files in the target which the /mir option would be deleting since they are not longer present in the source. Certainly when starting out using such a mechanism it is reassuring to be able to see what is happening at the detail level - and confirm it matches expectations (eg that 15 GB of deletions are those obsolete ISO files removed from the primary server recently).
By contrast a restic forget SNAPSHOTID –prune –dry-run only tells me that xx.yyy GiB are to be deleted.

But for sure not complaining - really impressed how restic is providing backup solutions which have plenty of features (like deduplication for instance) which relatively recently were only available in high-end commercial offerings.

The blob thing does not answer my variant of this question.

I’m not sure how OP meant it, but I would be interested in files that do not exist anymore in any of the other snapshots, by file name, not content. This is because I expect the content to change from time to time; it’s deletion I worry about.

At the moment I have no simple solution than to ls and write some script to compare. I doubt if it’ll be easy to do, and to be clear I’m not asking for someone to add it.

Just wanted to give a different perspective on the same question.

Hi @sc2maha,
Yes it would be possible to find files by name in a repository where snapshots have been deleted. But it depends very much on the fact that restic prune has NOT been run when you are looking for such files. Once restic prune has been run, all bets are off, since prunemight have removed old metadata during its run.

Finding these files would be a expensive operation. The trees, to which these files could have belonged to have to be recreated from the remains of the metadata which could or could not be successful. Assuming that trees still exist which point to a given basename, the answer can only be “file "pathname" may still exist in the repo”. This has to be done for all data blobs which have been used by "pathname".

For some types of cloud storage this would not be possible at all since all packfiles, referenced or not, have to be searched.

In other words a pretty complex operation.