Cal
December 12, 2022, 5:19am
1
If I perform these two backups, the 2nd backup, which is just parent folder, doesn’t seem to recognize that the sub-folder has already been backed up in the 1st backup.
restic backup "/data/sub1/"
no parent snapshot found, will read all files
Files: 100 new, 0 changed, 0 unmodified
Dirs: 10 new, 0 changed, 0 unmodified
restic backup "/data/"
no parent snapshot found, will read all files
Files: 300 new, 0 changed, 0 unmodified
Dirs: 30 new, 0 changed, 0 unmodified
I would expect the 2nd backup would say:
Files: 200 new, 0 changed, 100 unmodified
Dirs: 20 new, 0 changed, 10 unmodified
Why is that?
The parent selection algorithm only selects snapshots with exactly identical path.
You can manually specify the parent using --parent
.
See also
restic:master
← aawsome:multiple-parents
opened 02:06PM - 23 Nov 20 UTC
What does this PR change? What problem does it solve?
-------------------------… ----------------------------
Support multiple parents in the `backup` command and implements an algorithm to automatically find those.
Some examples:
With this PR, If you previously backed up `/foo/bar1` and `/foo/bar2` and now want to backup `/foo`, both previous snapshots are taken as parent. A side effect of the new parent selection algorithm is: If you previously backed up `/foo` previously and now backup `/foo/bar1`, the existing snapshot will be taken as parent.
Note that this PR modifies the `Snapshot` data structure used in the snapshots files by adding a `Parents` field.
The current `Parent` field is only set but never used by restic.
If there is only one (or zero) parent, the `Parent` field is still used, if there are more than one parents, the `Parents` field is now used.
Was the change discussed in an issue or in the forum before?
------------------------------------------------------------
closes #3118
Checklist
---------
- [x] I have read the [Contribution Guidelines](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#providing-patches)
- [x] I have enabled [maintainer edits for this PR](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
- [x] I have added tests for all changes in this PR
- [x] I have added documentation for the changes (in the manual)
- [x] There's a new file in `changelog/unreleased/` that describes the changes for our users (template [here](https://github.com/restic/restic/blob/master/changelog/TEMPLATE))
- [x] I have run `gofmt` on the code in all commits
- [x] All commit messages are formatted in the same style as [the other commits in the repo](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#git-commits)
- [x] I'm done, this Pull Request is ready for review
or (with still just one possible parent snapshot):
restic:master
← MichaelEischer:backup-group-by
opened 03:06PM - 10 Dec 22 UTC
<!--
Thank you very much for contributing code or documentation to restic! Plea… se
fill out the following questions to make it easier for us to review your
changes.
-->
What does this PR change? What problem does it solve?
-----------------------------------------------------
The backup command by default selected the parent snapshot based on the hostname
and the backup targets. When the backup path list changed, the backup command
was unable to determine a suitable parent snapshot and had to read all
files again. A similar scenario applies when using different sets of excludes while keeping the hostname and paths unchanged.
The new `--group-by` option for the backup command allows filtering snapshots
for the parent selection by `host`, `paths` and `tags`. It defaults to
`host,paths` which selects the latest snapshot with hostname and paths matching
those of the backup run.
I'm not entirely sure about the option name. The current name is `--group-by` as the mechanism works just like the snapshot grouping in the `forget` or `snapshots` command. But something like `--parent-group-by` or `--parent-filter-by` might be more intuitive.
<!--
Describe the changes and their purpose here, as detailed as needed.
-->
Was the change previously discussed in an issue or on the forum?
----------------------------------------------------------------
Fixes #3941
<!--
Link issues and relevant forum posts here.
If this PR resolves an issue on GitHub, use "Closes #1234" so that the issue
is closed automatically when this PR is merged.
-->
Checklist
---------
<!--
You do not need to check all the boxes below all at once. Feel free to take
your time and add more commits. If you're done and ready for review, please
check the last box. Enable a checkbox by replacing [ ] with [x].
-->
- [x] I have read the [contribution guidelines](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#providing-patches).
- [x] I have [enabled maintainer edits](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/allowing-changes-to-a-pull-request-branch-created-from-a-fork).
- [ ] I have added tests for all code changes.
- [x] I have added documentation for relevant changes (in the manual).
- [x] There's a new file in `changelog/unreleased/` that describes the changes for our users (see [template](https://github.com/restic/restic/blob/master/changelog/TEMPLATE)).
- [x] I have run `gofmt` on the code in all commits.
- [x] All commit messages are formatted in the same style as [the other commits in the repo](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#git-commits).
- [x] I'm done! This pull request is ready for review.
Cal
December 12, 2022, 10:57am
3
But the files do have the exact the same path?
E.g. the file /data/sub1/file1.txt
will be covered by both backups so it should not be backed up twice.
Yes, but this is irrelevant when taking a backup. You specify a path and when you make another backup with the same path the previous backup with that path is chosen as the parent (given there is one already). The data in the backup is still de-duplicated and all the things you expect. The files are not backed up twice as in twice the amount of storage space for the backed up files - Restic still knows this.
2 Likes