Reduce restic restore read amplification

What is the purpose of this feature request? What does it change?
The purpose of this feature request is to reduce the amount of data restic downloads from repository during recovery. This would reduce costs of recovery from repositories located on cloud storage that charges for downloads and could reduce the recovery time.

Problem description
As discussed here (What is restic restore even doing?) the amount of data downloaded from the repository by a restic restore is the size of the deduplicated data. That is, to restore a 100 GiB data set that requires 80 GiB repository size (due to deduplication), a restic restore will download 100 GiB of data. I believe restic should be able to benefit from deduplication also during the restore, that is to restore 100 GiB of data download of only 80 GiB of the unique data should be needed.

Current state
I’m not a programmer, and in particular I’ve never even looked at Go before, but from what I could grasp from restorer.go, the way restic restore works is:

  1. Walk the snapshot.
  2. Find files that should be restored.
  3. For each file marked for restore pull from the repository the data needed to restore that file.

Example
User requests restore of files a, b and c. Files a and c are identical.
Restic downloads repository file A1 and writes partial content to file a.
Restic downloads repository file A2 and writes partial content to file a.
Restic downloads repository file A3 and completes restore of file a.
Restic downloads repository file B1 and writes partial content to file b.
Restic downloads repository file B2 and writes partial content to file b
Restic downloads repository file B3 and completes restore of file b.
Restic downloads repository file A1 and writes partial content to file c.
Restic downloads repository file A2 and writes partial content to file c.
Restic downloads repository file A3 and completes restore of file c.

Please note that the above could be patently wrong.

Proposed approach
What I envision is a modified process. A new restore option would make restic restore:

  1. Walk the snapshot.
  2. Find files that should be restored.
  3. Identify the set of repository files it needs to restore all required files.
  4. Pull (preferably in parallel) repository files needed to restore and write all data from the repository file to the proper locations in the restored files. This seems clear as mud, I’m afraid, but bear with me and see the example.

Example
User requests restore of files a, b and c. Files a and c are identical.
Restic restore finds that it requires files A1, A2, A3, B1, B2 and B3 to restore the data.
Restic downloads repository file A1 and writes partial content to files a and c.
Restic downloads repository file A2 and writes partial content to files a and c.
Restic downloads repository file A3 and completes restore of files a and c.
Restic downloads repository file B1 and writes partial content to file b.
Restic downloads repository file B2 and writes partial content to file b
Restic downloads repository file B3 and completes restore of file b.

Benefit / Motivation for the change
Reduced amount of data downloaded from repository to complete a recovery. This would definitely reduce recovery cost from repositories that charge for downloads (e.g. cloud storage). This might reduce recovery time, because each piece of data required for recovery would only need to be downloaded once.

Cost
Additional effort to implement modified restore function.
Probably increased memory footprint.

Not sure what restic version you looked at, but current (as of 0.9.x) restore implementation behaves pretty much as you propose, except it guarantees output files are written start-to-end (which adds quite a bit of implementation complexity), see https://github.com/restic/restic/blob/v0.9.6/internal/restorer/doc.go. There is also simplified implementation waiting for review in PR 2195 that works exactly as you propose.

What is the actual problem you are trying to solve?

Thanks a lot for the pointer to the documentation, it looks like the algorithm restic uses matches what I had in mind.

However, now there’s a discrepancy between what should happen (files downloaded from repository not more than once) and what Zottelchen describes in the What is restic restore even doing? thread. 0.9.5 restic restore downloaded 205 GB from a B2 bucket of the size 149.1 GB. That’s the issue which got me to think, that the restic restore process does something sub-optimal.
I’ll see if I can reproduce similar symptoms (downloaded data size > repository size) with a repository on a restic server.

I ran the test. With a single snapshot in the repository and doing a full snapshot restore, the size of data downloaded from the repository by restic restore is larger than the repository size.

Setup
restic 0.9.5 from Fedora 31 rpm package
rest server 0.9.7 downloaded from the restic site

Client IP: 10.23.32.6
Server IP: 10.23.32.1

Backup data set:

ls -ll /tmp/restic/to-backup/
total 464312
-rw-r--r--. 1 root root 95088640 Dec  2 09:26 pack.copy.tar
-rw-r--r--. 1 root root 95088721 Dec  2 09:31 pack.e1.tar
-rw-r--r--. 1 root root 95088721 Dec  2 09:31 pack.e2.tar
-rw-r--r--. 1 root root 95088721 Dec  2 09:31 pack.e3.tar
-rw-r--r--. 1 root root 95088640 Dec  2 09:26 pack.tar

pack.tar and pack.copy.tar are identical. Each of pack.e*.tar is unique, created as

 gpg -z0 -o pack.eX.tar --symmetric --cipher-algo AES256 pack.tar 

with a different password for each file.

Test procedure

  1. Remove all snapshots from the test repository.

  2. Create a backup of the test directory:

    restic --repo ‘rest:http://user:password@10.23.32.1:8000/repo-test/’ --password-file password-test backup /tmp/restic/to-backup/
    repository 7607f052 opened successfully, password is correct

    Files: 5 new, 0 changed, 0 unmodified
    Dirs: 2 new, 0 changed, 0 unmodified
    Added to the repo: 362.736 MiB

    processed 5 files, 453.418 MiB in 0:38
    snapshot b5cf2171 saved

Repository size at the server:

du -sb /path/repo-test/
381534478 /path/repo-test/
  1. Run a few restores:

    restic --repo ‘rest:http://user:password@10.23.32.1:8000/repo-test/’ --password-file password-test restore b5cf2171 --target /tmp/restic/restored/
    repository 7607f052 opened successfully, password is correct
    restoring <Snapshot b5cf2171 of [/tmp/restic/to-backup] at 2019-12-02 10:11:15.006825281 +0100 CET by root@client> to /tmp/restic/restored/

  2. On the client set up a filter in iptraf-ng:

Source: 10.23.32.6, ports 0 to 65000
Destination: 10.23.32.1, ports 8000 to 8000
Protocols to match: TCP
Include/Exclude (I/E): I
Match opposite (Y/N): N

  1. Enable the filter and run IP traffic monitor on the interface with IP address 10.23.32.6.
  2. Delete the restore target directory on the client.
  3. Run a restore as described in 3).
  4. Wait for the TCP streams to show as closed on the iptraf-ng and collect the traffic data.
    Only collect data for the traffic from the server to the client.
  5. Reset the iptraf-ng capture and repeat steps 6) to 8) three times.

Results

For each restore restic opened 4 streams. Raw data below:

   stream  packets     bytes       total packets  total bytes  total MiB
   1       9548        85408760    48530          433812405    413.72
   2       19933       177968039
   3       5357        47952087
   4       13692       122483519

   1       13242       118503369   48400          432412649    412.38
   2       20301       180938998
   3       2400        21467238
   4       12457       111503044

   1       2325        20684805    50100          447312264    426.59
   2       17581       157002024
   3       16471       146859482
   4       13723       122765953
                                   49010          437845773    417.56  average

Analysis
For each restore restic downloaded a slightly different amount of data. However, in each case the amount of data downloaded was definitely larger than repository size: average download size was 417.56 MiB for repository size 362.74 MiB, that is about 15% bigger than the repository size.
With MTU of 1500 octets and 40 octets of IP and TCP headers, the network protocol overhead is about 3%: (40/1460)*100%. Therefore, for 363.86 MiB repository the size of the download should be about 373.83 MiB. Accounting for TCP/IP protocol overhead gives about 112% of the repository size in an average restore download.
Restored data set size was 453.42 MiB. Therefore, restic restore did benefit somewhat from the deduplication, as the downloaded data size was smaller than the restored data size.

Conclusions

  1. restic restore seems to download some data from repository more than once. In my opinion, this behaviour could be improved.
  2. Restic uses a different number of packets per stream and different total download sizes for each restore, however each time the final state is the same. This is interesting, but in itself not too disturbing. :wink:

Further questions

  1. How does the restic restore behaviour change when there are multiple snapshots in the repository? For a constant restore set size does the download size change?
  2. For a consant restore data set size how does the restic restore read amplification change for partial restore (only some files from a snapshot) vs full snapshot restore?
  3. For a consant restore data set size how does the restic restore read amplification change for a single-snapshot repo vs multi-snapshot repo?
  4. For a consant restore data set size, are there any differences between download sizes for cases 2) and 3), e.g. partial snapshot restore from a single-snapshot repo vs multiple-snapshots repo?
  5. How does the amount of the downloaded data scale? With the data set size, restored data set size, some combination of the two, or is it a constant?

If answers to any of the above questions would be useful for investigating the issue I’m happy to run further tests.

If you have any suggestions to improve the test procedure, I can run a modified test.

Next step
I’ll run a test for the PR 2195 version. From its documentation it does exactly what I had in mind, I expect it solves the issue.

Assuming you are using HTTP-based backend, you should probably account for that protocol overhead.

There is one case when restic is expected to download the same data multiple times. Say you have the same blob present in two files, F1 and F2, only in file F1 the blob is at very beginning of the file, and in file F2 the blob is at end of the file. Because restic writes output files start-to-end, it will write the blob to F1, then put the blob to small-ish in-memory cache until it can write the blob to F2. If the cache overflows, the blob will be downloaded second time. PR 2195 writes output files out-of-order and is expected to download each blob exactly once.

Restore also downloads each repository data file with single continuous backend request. If the data file contains blobs from “other” snapshots between blobs from “this” snapshot, restore will download and discard the other "between blobs.