Reduce restic restore read amplification

pau · December 1, 2019, 10:46am

What is the purpose of this feature request? What does it change?
The purpose of this feature request is to reduce the amount of data restic downloads from repository during recovery. This would reduce costs of recovery from repositories located on cloud storage that charges for downloads and could reduce the recovery time.

Problem description
As discussed here (What is restic restore even doing?) the amount of data downloaded from the repository by a restic restore is the size of the deduplicated data. That is, to restore a 100 GiB data set that requires 80 GiB repository size (due to deduplication), a restic restore will download 100 GiB of data. I believe restic should be able to benefit from deduplication also during the restore, that is to restore 100 GiB of data download of only 80 GiB of the unique data should be needed.

Current state
I’m not a programmer, and in particular I’ve never even looked at Go before, but from what I could grasp from restorer.go, the way restic restore works is:

Walk the snapshot.
Find files that should be restored.
For each file marked for restore pull from the repository the data needed to restore that file.

Example
User requests restore of files a, b and c. Files a and c are identical.
Restic downloads repository file A1 and writes partial content to file a.
Restic downloads repository file A2 and writes partial content to file a.
Restic downloads repository file A3 and completes restore of file a.
Restic downloads repository file B1 and writes partial content to file b.
Restic downloads repository file B2 and writes partial content to file b
Restic downloads repository file B3 and completes restore of file b.
Restic downloads repository file A1 and writes partial content to file c.
Restic downloads repository file A2 and writes partial content to file c.
Restic downloads repository file A3 and completes restore of file c.

Please note that the above could be patently wrong.

Proposed approach
What I envision is a modified process. A new restore option would make restic restore:

Walk the snapshot.
Find files that should be restored.
Identify the set of repository files it needs to restore all required files.
Pull (preferably in parallel) repository files needed to restore and write all data from the repository file to the proper locations in the restored files. This seems clear as mud, I’m afraid, but bear with me and see the example.

Example
User requests restore of files a, b and c. Files a and c are identical.
Restic restore finds that it requires files A1, A2, A3, B1, B2 and B3 to restore the data.
Restic downloads repository file A1 and writes partial content to files a and c.
Restic downloads repository file A2 and writes partial content to files a and c.
Restic downloads repository file A3 and completes restore of files a and c.
Restic downloads repository file B1 and writes partial content to file b.
Restic downloads repository file B2 and writes partial content to file b
Restic downloads repository file B3 and completes restore of file b.

Benefit / Motivation for the change
Reduced amount of data downloaded from repository to complete a recovery. This would definitely reduce recovery cost from repositories that charge for downloads (e.g. cloud storage). This might reduce recovery time, because each piece of data required for recovery would only need to be downloaded once.

Cost
Additional effort to implement modified restore function.
Probably increased memory footprint.

ifedorenko · December 1, 2019, 1:16pm

Not sure what restic version you looked at, but current (as of 0.9.x) restore implementation behaves pretty much as you propose, except it guarantees output files are written start-to-end (which adds quite a bit of implementation complexity), see https://github.com/restic/restic/blob/v0.9.6/internal/restorer/doc.go. There is also simplified implementation waiting for review in PR 2195 that works exactly as you propose.

What is the actual problem you are trying to solve?

pau · December 1, 2019, 5:15pm

Thanks a lot for the pointer to the documentation, it looks like the algorithm restic uses matches what I had in mind.

However, now there’s a discrepancy between what should happen (files downloaded from repository not more than once) and what Zottelchen describes in the What is restic restore even doing? thread. 0.9.5 restic restore downloaded 205 GB from a B2 bucket of the size 149.1 GB. That’s the issue which got me to think, that the restic restore process does something sub-optimal.
I’ll see if I can reproduce similar symptoms (downloaded data size > repository size) with a repository on a restic server.

pau · December 2, 2019, 11:01am

I ran the test. With a single snapshot in the repository and doing a full snapshot restore, the size of data downloaded from the repository by restic restore is larger than the repository size.

Setup
restic 0.9.5 from Fedora 31 rpm package
rest server 0.9.7 downloaded from the restic site

Client IP: 10.23.32.6
Server IP: 10.23.32.1

Backup data set:

ls -ll /tmp/restic/to-backup/
total 464312
-rw-r--r--. 1 root root 95088640 Dec  2 09:26 pack.copy.tar
-rw-r--r--. 1 root root 95088721 Dec  2 09:31 pack.e1.tar
-rw-r--r--. 1 root root 95088721 Dec  2 09:31 pack.e2.tar
-rw-r--r--. 1 root root 95088721 Dec  2 09:31 pack.e3.tar
-rw-r--r--. 1 root root 95088640 Dec  2 09:26 pack.tar

pack.tar and pack.copy.tar are identical. Each of pack.e*.tar is unique, created as

 gpg -z0 -o pack.eX.tar --symmetric --cipher-algo AES256 pack.tar

with a different password for each file.

Test procedure

Remove all snapshots from the test repository.
Create a backup of the test directory:

restic --repo ‘rest:http://user:password@10.23.32.1:8000/repo-test/’ --password-file password-test backup /tmp/restic/to-backup/
repository 7607f052 opened successfully, password is correct

Files: 5 new, 0 changed, 0 unmodified
Dirs: 2 new, 0 changed, 0 unmodified
Added to the repo: 362.736 MiB

processed 5 files, 453.418 MiB in 0:38
snapshot b5cf2171 saved

Repository size at the server:

du -sb /path/repo-test/
381534478 /path/repo-test/

Run a few restores:

restic --repo ‘rest:http://user:password@10.23.32.1:8000/repo-test/’ --password-file password-test restore b5cf2171 --target /tmp/restic/restored/
repository 7607f052 opened successfully, password is correct
restoring <Snapshot b5cf2171 of [/tmp/restic/to-backup] at 2019-12-02 10:11:15.006825281 +0100 CET by root@client> to /tmp/restic/restored/
On the client set up a filter in iptraf-ng:

Source: 10.23.32.6, ports 0 to 65000
Destination: 10.23.32.1, ports 8000 to 8000
Protocols to match: TCP
Include/Exclude (I/E): I
Match opposite (Y/N): N

Enable the filter and run IP traffic monitor on the interface with IP address 10.23.32.6.
Delete the restore target directory on the client.
Run a restore as described in 3).
Wait for the TCP streams to show as closed on the iptraf-ng and collect the traffic data.
Only collect data for the traffic from the server to the client.
Reset the iptraf-ng capture and repeat steps 6) to 8) three times.

Results

For each restore restic opened 4 streams. Raw data below:

   stream  packets     bytes       total packets  total bytes  total MiB
   1       9548        85408760    48530          433812405    413.72
   2       19933       177968039
   3       5357        47952087
   4       13692       122483519

   1       13242       118503369   48400          432412649    412.38
   2       20301       180938998
   3       2400        21467238
   4       12457       111503044

   1       2325        20684805    50100          447312264    426.59
   2       17581       157002024
   3       16471       146859482
   4       13723       122765953
                                   49010          437845773    417.56  average

Analysis
For each restore restic downloaded a slightly different amount of data. However, in each case the amount of data downloaded was definitely larger than repository size: average download size was 417.56 MiB for repository size 362.74 MiB, that is about 15% bigger than the repository size.
With MTU of 1500 octets and 40 octets of IP and TCP headers, the network protocol overhead is about 3%: (40/1460)*100%. Therefore, for 363.86 MiB repository the size of the download should be about 373.83 MiB. Accounting for TCP/IP protocol overhead gives about 112% of the repository size in an average restore download.
Restored data set size was 453.42 MiB. Therefore, restic restore did benefit somewhat from the deduplication, as the downloaded data size was smaller than the restored data size.

Conclusions

restic restore seems to download some data from repository more than once. In my opinion, this behaviour could be improved.
Restic uses a different number of packets per stream and different total download sizes for each restore, however each time the final state is the same. This is interesting, but in itself not too disturbing.

Further questions

How does the restic restore behaviour change when there are multiple snapshots in the repository? For a constant restore set size does the download size change?
For a consant restore data set size how does the restic restore read amplification change for partial restore (only some files from a snapshot) vs full snapshot restore?
For a consant restore data set size how does the restic restore read amplification change for a single-snapshot repo vs multi-snapshot repo?
For a consant restore data set size, are there any differences between download sizes for cases 2) and 3), e.g. partial snapshot restore from a single-snapshot repo vs multiple-snapshots repo?
How does the amount of the downloaded data scale? With the data set size, restored data set size, some combination of the two, or is it a constant?

If answers to any of the above questions would be useful for investigating the issue I’m happy to run further tests.

If you have any suggestions to improve the test procedure, I can run a modified test.

Next step
I’ll run a test for the PR 2195 version. From its documentation it does exactly what I had in mind, I expect it solves the issue.

ifedorenko · December 2, 2019, 4:13pm

Assuming you are using HTTP-based backend, you should probably account for that protocol overhead.

There is one case when restic is expected to download the same data multiple times. Say you have the same blob present in two files, F1 and F2, only in file F1 the blob is at very beginning of the file, and in file F2 the blob is at end of the file. Because restic writes output files start-to-end, it will write the blob to F1, then put the blob to small-ish in-memory cache until it can write the blob to F2. If the cache overflows, the blob will be downloaded second time. PR 2195 writes output files out-of-order and is expected to download each blob exactly once.

Restore also downloads each repository data file with single continuous backend request. If the data file contains blobs from “other” snapshots between blobs from “this” snapshot, restore will download and discard the other "between blobs.

saviodsouza · December 15, 2019, 4:45pm

How would rclone cache help in performance to reduce bandwidth.

pau · January 12, 2020, 1:15pm

It took me longer than I planned, but I re-ran the tests for restic with and without PR 2195 patch. Test procedure was as described in the post above with one modification: I ran prune after forgetting all snapshots in the test repository.

Results

restic 0.9.6 (v0.9.6-37-gfe430a68) compiled with go1.13.5 on linux/amd64 (i.e. PR 2195)  
                                                                                           real    user    sys    
restore     stream    packets    bytes        total packets    total bytes    total MiB    restore time             avg packet size
1           1         20236      180303635    47764            426509471      406.75       8.738   11.467  1.243    8910
1           2         11766      105306870                                                                          8950
1           3         6327       56569818                                                                           8941
1           4         9435       84329148                                                                           8938
2           1         18315      162904829    48151            429703886      409.8        8.666   11.535  1.135    8895
2           2         15039      134427901                                                                          8939
2           3         14500      129754264                                                                          8949
2           4         297        2616892                                                                            8811
3           1         20056      178409117    48973            437072990      416.83       8.666   11.535  1.135    8896
3           2         14880      133076910                                                                          8943
3           3         14037      125586963                                                                          8947
3           4                                                                                                       #DIV/0!
                                              48296            431095449      411.13  average download size            
                                                               380674801      363.04  restored data set            
                                                                              40/8960 TCP overhead (9000 octets frame, 40 octets IP and TCP headers)
                                                               381534478      363.86  repo size            
                                                               383237757      365.48  repo size with TCP overhead (repo size * (1+TCP overhead))            
                                                                    1.12991260374869  ratio of average download size to the repo size            
                                                                    1.12490423552588  ratio of average download size to the repo size with TCP overhead            

restic 0.9.6 (v0.9.6-40-gd70a4a93) compiled with go1.13.5 on linux/amd64 (i.e. master branch) 										
restore	stream	packets	bytes	        total packets	total bytes	total MiB	restore time			
							                           real	user	sys	avg packet size
1	1	15288	136723372	47522	        424836920	405.16	8.409	11.593	1.095	8943.18236525379
1	2	5764	51598119							8951.79024982651
1	3	18330	163695040							8930.44408074195
1	4	8140	72820389							8945.99373464373
2	1	20774	184698186	50002	        446049327	425.39	8.755	11.545	1.152	8890.83402329835
2	2	6053	54178035							8950.60878903023
2	3	18513	165432217							8936.00264678874
2	4	4662	41740889							8953.42964392964
3	1	23892	212791778	48287	        431163807	411.19	8.581	11.439	1.257	8906.40289636698
3	2	10890	97518116							8954.83158861341
3	3	13505	120853913							8948.82732321362
3	4									#DIV/0!
                        		48604     	434016685	413.91	average download size			
				                 	380674801	363.04	restored data set			
					                               40/8960  TCP overhead			
			                		381534478	363.86	repo size			
			                 		383237757	365.48	repo with TCP overhead (repo size * (1+TCP overhead)			
				         		1.13755290496345	ratio of average download size to the repo size			
				        		1.13251067089854	ratio of average download size to the repo size with TCP overhead		    	

restic 0.9.5 compiled with go1.13beta1 on linux/amd64  (Fedora 31)										
    restore	stream	packets	bytes	total packets	total bytes	total MiB	restore time			
    							real	user	sys	avg packet size
    1	1	13047	116646642	48432	432232484	412.21	8.577	11.496	1.204	8940.49528627271
    1	2	14243	127496647							8951.53036579372
    1	3	21142	188089195							8896.47124207738
    1	4								                #DIV/0!
    2	1	16187	144567205	48088	429520665	409.62	8.511	11.489	1.25	8931.06844999073
    2	2	18818	167815793							8917.83361674992
    2	3	13083	117137667							8953.4255904609
    2	4									        #DIV/0!
    3	1	13117	117188565	47977	428578010	408.72	8.445	11.578	1.169	8934.09811694747
    3	2	19982	178232774							8919.66639975978
    3	3	8806	78842848							8953.31001589825
    3	4	6072	54313823							8944.96426218709
    				48166	430110386	410.18	average download size			
    					380674801	363.04	restored data set			
    			                               40/8960	TCP overhead			
    					381534478	363.86	repo size			
    					383237756.919643	365.48	repo with TCP overhead (repo size * (1+TCP overhead)			
    						1.12730170944869	ratio of average download size to the repo size			
    						1.12230491408559	ratio of average download size to the repo size with TCP overhead

Analysis
I’m not going to go full-out statistical on the data, but

There seems to be a notable (~12%) overhead, possibly due to HTTP being the transport, for data download. That could be 12% bigger restore bill for all repo back-ends that charge for egress traffic.
I can’t see major differences in the amount of data downloaded by each restic version. Which means, that either:

I didn’t compile the PR 2195 correctly (possible but not too probable).
PR 2195 generally doesn’t have significant impact on the amount of data downloaded by restic for a restore (unlikely).
I need a different test case (most likely in my opinion).

My guess is that the contributing factors are:

Small repo size. If the complete repo fits into the restic cache, then there’s no possibility to generate a cache miss and trigger a re-download by the non-PR 2195 restic versions.
There could be little opportunity to generate “interleaved” files with blobs required to trigger the scenario described by ifedorenko. The whole repo is rather small and contains only 5 files.

Next steps

I’ll see what happens if I try to run a larger (~100 to 200 GiB) restore from my production repository.

@ifedorenko any suggestions for a good test case that would trigger a re-download of a file on non-PR 2195 versions?

ifedorenko · January 12, 2020, 2:20pm

Create two 100M random byte strings, A and B.
Create two files that include the byte strings in reverse order, fileAB and fileBA. Backup the two files.
I expect master and 0.9.6 to download each repository pack file twice during restore, and PR2195 once.

100MB byte strings size is chosen to overflow restore pack file cache.

marco-m · January 14, 2020, 5:33pm

In case this can be useful, the following sequence generates the files mentioned above (on a Unix-like OS):

# Generate file A, made of random bytes, size 100MB
dd if=/dev/random of=A bs=1m count=100
# Generate file B, made of random bytes, size 100MB
dd if=/dev/random of=B bs=1m count=100

# Copy A to fileAB, then append B
cp A fileAB
dd if=B of=fileAB bs=1m seek=100

# Copy B to fileBA, then append A
cp B fileBA
dd if=A of=fileBA bs=1m seek=100

Depending on the type of Unix, you might have to use bs=1MB instead of bs=1mb

cdhowie · January 16, 2020, 12:57am

This is a little over-engineered.

cat A B >fileAB
cat B A >fileBA

marco-m · January 17, 2020, 11:40am

I considered cat and redirection, but for some reasons in my brain cat was accepting only one input, so I scratched it. You are right.

To summarize:

# Generate file A, made of random bytes, size 100MB
dd if=/dev/random of=A bs=1m count=100
# Generate file B, made of random bytes, size 100MB
dd if=/dev/random of=B bs=1m count=100

# Create fileAB with proper content
cat A B > fileAB

# Create fileBA ...
cat B A > fileBA