Change of parent choosing behaviour

Hello everybody

During using restic my all day life, I’ve found that if I add a backup-location to restic, it rereads the whole file system:
Just assume the following scenario:
(A)Normally I backup with the following command:

restic backup /home /media/disk1 /media/disk2 /media/disk3

(B)Now, once in a while I change the command to:

restic backup /home /media/disk1 /media/disk2 /media/disk3 /media/disk4

This means, that if I run (B) a quite old parent is used, or even if disk4(Maybe disk4 are some usb-sticks or so) isn’t static the whole filesystem is reread.

So therefore I suggest a changing in the parent snapshot choosing with the implementation of a norm.
This whole text is based on the assumption, that the parent snapshot is choosing the following way: Same machine and same backup locations. (As I once read in this forum.)
How do I consider this to work:
Option 1:
First of all we create the vector space V, in which every dimension represents exactly one unique backup location and one dimension is for the time.

Let’s take the vector(1) of what has to be backed up and assign each backup location a one and then we add one element: the time. So we have the vector(K). We then take the vectors of all snapshots in the repository(X_i, with i in the set of all integers). While each backup location gets a different dimension in the vector. Then we calculate the vectors Y_i = K- X_i and then we take its length. (Time has to be manipulated before that it makes sense. Maybe multiply it with the unit: 1 Snapshot/week) And then we choose as the parent snapshot the snapshot with the lowest norm.
It has also to be noted, that the Y_i vectors filled with plus or minus one beside the time, should be thrown away.
Option 2:
Similar to Option 1, but with the difference that the vector space consists of the number of files (or size) in each backup location and not just ones or zeros.

Conclusion
This implies, that the upper case may get as a parent (A) and not the old (B). In particular option 2 could reduce the backup time a lot in that use case.

This is yet a theoretical concept.

Kind regards,

Please take a look at:

1 Like

Thanks for linking those two. I did not see those, which discusses the same problem. I believe the first in particular would also solve the problem and it is already an ongoing implementation. :+1:

PR 3121 does two things:

  • it allows multiple parent snapshots. Every parent snapshot is used to find matching files and dirs that are already contained in the repository.
  • it defines a partial order on snapshots parametrized by the paths specified to backup (basically on backup time + sub-path relationship)

The algorithm to find the parents is then simply: Filter out non-fitting snapshots and then find all maximum elements with respect to the partial order :wink:

In your case if you backup (B) it would use the latest snapshot (A) plus the latest snapshot (B) as parents, as both are maximum elements. If you backup (A) it would use either the latest snapshot (A) or the latest snapshot (B) depending on which one was later.

But please feel free to test it and report if it works as expected!