`restic backup` is taking longer and longer to run, best way to run `restic prune`?

David · May 31, 2019, 5:58am

Nicely done on gathering that information! That CSV you put together probably took quite a bit of work, but it sure helps identify what’s going on.

My response below is a little orthogonal to your original inquiry, and if that’s totally unsatisfying, please let me know.

In short: The slowdown trend you are seeing (and which is illustrated in your original graph) is not caused by the growing repository size, but instead by the quantity of files changing on each day

For example, your fastest backups were 1,2,3,4 and 5. If you examine the “Added to the repo” number in your stats messages, you will see that these backups averaged 1.14 GiB “Added to repo” per backup. (1.65, 0.78, 1.24, 1.17 GiB respectively)

Your slowest backups were 6,7,8,9. These backups averaged 7.3 GiB “Added to repo” per backup. (10.0, 8.2, 3.7,7.6 GiB respectively). This is why they ran so much longer - they wrote six times more data to the repository.

Some further interesting tidbits:

This might not allay all your concerns… The data set doesn’t really provide the opportunity to measure what happens to backup speeds as the repository grows, but I think we can conclude that the effect is not nearly so significant as you first suspected. That’s good news.
However, you might still be in danger of exceeding your backup window on days when lots of data changes.
If you change the Y axis in your graph to “Bytes added per second” (“Bytes added to repo” / “total seconds for backup to complete”), you will observe that restic is working substantially faster (more bytes added per second) in the later backups.
If you change the X axis to represent “Bytes added to repo” and keep the Y axis as “Bytes added per second”, it dramatically shows that restic is working more efficiently (faster in terms of Bytes added per second) when it has more data to backup.

If I were in your shoes:

I would definitely still seek an answer to your original questions of “How can I perform occasional forget/prunes on this repository” and “what is the most efficient way to do so”. I just think it’s commonsense that you will want to run those operations occasionally.
I’d recognize that repo size management is important, but not exceedingly time-critical. I don’t think you’re in danger of exceeding your backup window due to repository size anytime soon.
I would immediately try to get some good metrics around “bytes changed per day”, as this is the most likely thing to actually cause your backup duration to exceed to window of availability.
I might explore whether you could achieve substantial performance gains by executing this backup to local storage, and then synchronize that storage to the cloud. (This is what I do!).

Hope this helps!