How to speed up tiny incremental checks?

BenWiederhake · January 30, 2023, 2:33am

tl;dr: check always reads all trees from all snapshots, which can take surprisingly long.

I’m using restic in such a way to invoke backup and then check --read-data-subset=.1% every 20 minutes. After I set this up, a single such run took only a minute or so. That meant I always had fresh backups of my entire home directory, and I could be certain that any problem would become apparent very quickly. Woohoo!

However, restic check always reads all trees from all snapshots. By now this makes up around 8 minutes of a 10 minute run, which is not so nice. Note that rebuild-index does not change anything, as expected.

I can think of the following workarounds, none of which are nice:

I could continue as-is, and accept that my computer is executing restic-check forever.
I make calendar entries and run forget && prune every month, when I’m at home and know that it can run safely for a long time. However, my hope is to have a backup system that I only have to revisit once a year, if at all. Also, “delete your backups” feels like a silly solution for a backup system. (But yes, it does work, that’s what I’ve been doing a few times for various reasons.)
I could run it like check --with-cache, but that basically defeats the entire purpose of --read-data-subset.

Here are some ideas:

Option 1: Let’s add a flag --with-cache-but-only-for-trees-and-not-for-blobs (better names welcome), such that the phases load indexes, check all packs, and check snapshots, trees and blobs become much faster (by loading everything from the local cache), and the phase read 0.1% of data packs still reads from the actual repo, thus making sure that the repository is still intact with high probability.
Option 2: Let’s add a flag so that check --no-verify-trees skips the phase check snapshots, trees and blobs entirely. It could still load the index(es) from the indicated destination (either local cache or the actual repository), which should be enough to sample uniformly from the blobs.
or something else entirely, of course.

This sounds similar to Can I Speed up restic check --read-data (locally)? - #10 by shd2h , but at least I’m very certain that in my case the bottleneck is check snapshots, trees and blobs and it’s random accesses to blobs (which is where the trees are stored, if I understand that correctly).

What do you think, how could this be solved better?

alexweiss · January 30, 2023, 8:10am

If you run a backup every 20 minutes, which means 72 backups per day you shouldn’t be surprised that you have many snapshots and potentially many trees

Why do you think you can be certain that any problem would become apparent very quickly? You are checking 72 times a day a very tiny randomly chosen subset of your pack files. The probability that you find an error of a specific pack files within a day is about 7%; that you find it within a month is around 88,5%, but still not 100%… If you want to be more certain, you should better use the n/t syntax with check.

The option --with-cache already caches tree packs, but no data packs.

My suggestion would be to run check less often but with checking more data packs at once and using the -read-subset n/t syntax. Feel free to use --with-cache if you trust your cache enough.

alexweiss · January 30, 2023, 10:22am

Another stupid question: What is your problem with automatically running forget --prune regularly? For me, it is quite natural to “thin out” old backups…

BenWiederhake · January 30, 2023, 12:35pm

I’m not surprised that there are many snapshots and many trees, I’m surprised that it takes 10 minutes to check a few thousand things, and that I can’t disable something unwanted that takes up most of the running time

Thanks for the hint that --with-cache already does what I want! I would have guessed that it means that only the local cache is used. I’ll make a PR to change the --help description of that flag.

GuitarBilly · February 3, 2023, 4:27pm

interesting topic and agree on the suggestion to clarify the documentation.

restic check --help

on the linux commandline shows:

--with-cache use the cache

Documentation suggests everything is cached:
https://restic.readthedocs.io/en/stable/cache.html
" Snapshot, Data and Index files are cached in the sub-directories snapshots, data and index, as read from the repository."

Then this chapter mentions only meta data what implies that actual data is not cached:
https://restic.readthedocs.io/en/stable/manual_rest.html#caching
“Restic keeps a cache with some files from the repository on the local machine. This allows faster operations, since meta data does not need to be loaded from a remote repository.”