Is it okay to keep all snapshots forever?

Hi, I discovered restic just about a week ago and I really have to say it seems to be an awesome piece of software. After watching the video from the C4 talk, reading the user documentation and doing some first tests, I have created a bb2 cloud storage account and I have set up scripts and cron jobs to perform fully automated backups every night. I have also tested to restore parts of the backup on another (virtual) machine that is running a different distro. In the course of this I ran into some minor issues that were pretty much solvable.

But there still is one topic that I completely ignored until now: I havenā€™t taken the time to read about and understand the concept of removing snapshots with forget and prune. What I am wondering in this context is: As I currently have less than 200 GB of data that needs to be backed up and only small amounts of data are added or changed daily, do I really need to bother with deleting old snapshots? Or is it a completely reasonable and recommendable approach to just keep all snapshots forever?

Would I have to expect to run into (performance?) issues, problems or restrictions if my restic repository e.g. contains a four-digit number of snapshots in about three years? And would it be enough to start thinking about using forget and prune when

a) the expenses for bb2 cloud storage exceed an acceptable level or

b) the regular restic operations seem to be affected in some way by the large number of snapshots?

Many thanks in advance for every reply.

If you are satisfied with your storage costs and with the speed of restic operations, there is of course no need to run forget and prune :wink:

Lots of snapshots and lots of index files - you most likely will encounter both when not using forget and prune - will however decrease your performance. So think about pruning when you are no longer satisfied with your performance!

A side note: There are some current improvements for prune under development which will make prune work very well in your scenario. So it might be a good strategy to wait another 6-12 month until this has been settled into a restic release.
There are also speed improvements in the unreleased master or under development which may make you stay longer satisfied without pruning, so in any case itā€™s good to take a look at new releases :wink:

1 Like

@alexweiss

Good to know!
Do you know if in a perspective of AWS S3 and Data Transfer, will those improvements make ā€œpruneā€ use less data transfer?

As per post: Huge amount of data read from S3 backend

prune downloads every pack header to create a temporary index, crawls all snapshots (which means downloading every tree object that can be reached from any snapshot), downloads any blobs that are still used and exist in the same pack as an object to be deleted, re-uploads these blobs, deletes the old packs, then reindexes again (downloading every pack header a second time).

If you do this frequently, the traffic adds up pretty quickly.

Thank you very much!

The re-implementation I propose in

will drastically reduce the data transfer. Especially if you only have one system to backup from and choose prune options such that only few packs need to be repacked, it will mainly use the cache and doesnā€™t download or re-upload much.

In fact, Iā€™m already using it (with a few other patches) with a repository on a cold storage. The prune command then only saves a few new files (mainly index files) and removes files from the repository without accessing those.

1 Like

Oh manā€¦ thanks for this!
This needs to be released as Critical Emergency! :smiley:

Really appreciate that.
I subscribed the discussion to stay updated.