Performance of restore from incremental backups

ashwinp7 · May 24, 2022, 5:43pm

Hi I am trying to understand if the restore performance could vary based on the characteristics of a backup.

i.e would the behavior be very different if we do N snapshots with few changes vs doing N snapshots with lots of changes vs doing N snapshots with some middle ground of changes.

and would the restore performance be different if we restore from the 1st snapshots vs last snapshot

alexweiss · May 25, 2022, 6:23am

I would say, try it and report your results

IMO it shouldn’t make much difference (and maybe not even a measurable one).
The restore process works like this:

find out the needed files to restore and the corresponding blobs for the contents of all files (and create+allocate all the files)
determine which pack files are needed to get all those blobs
run over all needed pack files and the needed blobs within and write the blobs to all places within the already allocated files where the data is needed

So, overall, the restore process mainly depends on the amount of data to be extracted. And this does not depend at all on how the data was added.

Of course there may be differences depending on the exact location of the blobs which does depend on the order of backups, etc. But it then becomes very complicated to predict what situation is the better one with respect to performance. And, as already said, I strongly question whether that effect is measurable at all…

I think effects that influence restore performance much more are variances in how your repo is accessed (network or disc drive performance) and the data is written (again disc drive or FS performance).

cindyforcia2 · May 26, 2022, 7:49am

@alexweiss I have tried it. So far, things are in my favor.