Expected speed of practically unchanged incremental backups


#1

I just started using restic and it is great! However, I am a bit surprised about the slow incremental backup speed.

I am backing up a large photo library to a B2 bucket and the plan is to do regular incremental backups where there will be practically no changes. This is the kind of results I am getting (with restic 0.9.3 compiled with go1.11.2 on darwin/amd64, using only standard settings as far as I am aware) when backing up from a USB 3.0 HDD:

scan finished in 25.037s: 8097 files, 97.860 GiB

Files:        8097 new,     0 changed,     0 unmodified
Dirs:            3 new,     0 changed,     0 unmodified
Data Blobs:      0 new
Tree Blobs:      4 new
Added to the repo: 1.493 KiB

processed 8097 files, 97.860 GiB in 35:28

With 1 TB of data I expect an incremental backup without changes to take more than 5 hours. Thus I wonder:

  • Is this the speed to be expected of restic?
  • Why is it so slow, compared to e.g. borg (at least from my experience)?
  • Can I do anything about it?
  • Are there any plans to do anything about it as development of restic continues?
  • Also: Should I file an issue on GitHub?

For comparison, this is the speed I’m getting when backing up similar data from an internal SSD drive:

scan finished in 22.084s: 5219 files, 49.629 GiB

Files:        2667 new,     1 changed,  2551 unmodified
Dirs:            0 new,     3 changed,     0 unmodified
Data Blobs:      1 new
Tree Blobs:      4 new
Added to the repo: 9.476 KiB

processed 5219 files, 49.629 GiB in 2:36

#2

Hi and welcome to the forum :slight_smile:

So just to get this straigh: Are you experiencing a slow back up time to a remote backend, in your case B2? Or are you concerned about the time it takes for a backup to an external storage like your external HDD?

B2 is a pretty slow remote in general - mostly because restic waits on the network; it could be faster but that’s just how things are when using the B2 backend.
What you can do is, you can increase the B2 connections. See this part of the documentation: https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#backblaze-b2
If you add this to your restic backup command -o b2.connections=20 and try again, it should improve.


#3

I am backing up from local SSD and HDD storage to B2, with a stable 20 Mbps upstream connection, so I do not think . I have not tried backing up to local storage, so I cannot compare. My assumption was that the speed issues have nothing to do with B2 because the files are only compared with the local cache for changes. I might be mistaken.

Anyway, I tried to back up the same files with -o b2.connections=20, but as you can see below, the unexpected results did not have much to do with that. The backup duration was only about 30 seconds in both instances, but clearly the reason for that is that restic this time (see below) considered the files as “unmodified”, whereas during the last incremental backup (see my original post) they were seen as “new”. (They were not, which is why only kilobytes were added to the repo.) Any idea what this is due to? I cannot see how the files were modified before the first incremental backup.

From the external USB 3.0 HDD:

scan finished in 20.570s: 8097 files, 97.860 GiB

Files:           0 new,     0 changed,  8097 unmodified
Dirs:            0 new,     3 changed,     0 unmodified
Data Blobs:      0 new
Tree Blobs:      4 new
Added to the repo: 1.490 KiB

processed 8097 files, 97.860 GiB in 0:28

From the internal SSD:

scan finished in 20.929s: 5218 files, 49.623 GiB

Files:           0 new,     0 changed,  5218 unmodified
Dirs:            0 new,     3 changed,     0 unmodified
Data Blobs:      0 new
Tree Blobs:      4 new
Added to the repo: 1.472 KiB

processed 5218 files, 49.623 GiB in 0:30

#4

With this “b2” adjust mentioned earlier, you can try adding a new dummy file and make a new backup. As @moritzdietz said and afaik B2 is pretty slow. I have a 140GB repo and doing regular backups with changes of MB or sometimes even GB it only takes a couple of minutes (between 1 to 5); sometimes less than a minute and that’s a remote ssh server.


#5

Hm, interesting case. In the original backup output you pasted almost half of the files were detected as “new”, which means restic was unable to find a file with the same path+name from the previous snapshot. Which is… odd.

If restic is unable to find a previous version of a file, it needs to re-read the complete file. In your case it then detected that the data contained in most new files was already present in the repo, so only ~10MiB were added overall.

All of this boils down to the question why restic detects “new” instead of modified/unmodified files. You can find out what happened by using the diff command on the two snapshots you reported above. In addition, for the next incremental backup you can try using -v -v, which will show a small status for each file (new/modified/unmodified).

I think the backup times you see are caused by the data you’re saving. Maybe the database renames files? Which would make it hard for restic to efficiently save it, without explicit support for that format. Which program are you using?


#6

The data I am backing are mainly photos and videos organized in folders. They are also in an Adobe Lightroom database, but that is separate from the photos and videos themselves and not supposed to modify the files. (I don’t think it does either, though I cannot guarantee it.) I does not rename files.

When I do an incremental backup of a bunch of files, restic seems to consider all* of them either new (but unchanged) or unmodified (while registering as expected files that have actually been changed, added or removed). I suspected this could be due to my using different paths when backing up (e.g. first a backup of /dir/subdir1, then /dir/subdir2, then /dir), due to my sometimes using trailing slashes, sometimes not, or that restic would consider the files new also the second time it sees it, but not after that. But none of this seems to hold up.

In conclusion, I am still looking for an explanation.

* The exception you mention might provide a clue. I will run restic diff and examine it as soon as the currently running restic prune is finished, in an hour or so.


#7

This is exactly what would cause it. The path should be the same each backup, or restic does not consider them to be the same files and won’t be able to compare their metadata (mtime+size) to see if the file needs to be reprocessed.

None of this matters to restic.


#8

I see. That is probably the reason, then.

I recently moved one of the directories I am backing up inside another (and went from restic backup a b to restic backup a (with b now being inside a) and it turned out this also made restic consider all the files as new. So any change whatsoever to the paths restic is told to back up causes all files to scanned as new?* That is quite inconvenient with large amounts of data. Is this a conscious choice by the developers? Are there any plans to change the behaviour?

* That cannot be true either, or all files would have been counted as new in my second example in the original post, where the paths differed from previous backups.

I tried doing a diff but as the backup paths were quite different between the two backups the number of new and unmodified files was not the same as in the summary after the completed backup. I have not found a way to get a diff with the same data as in the summary. (The amount of data added to the repo also seems to only be visible in these summaries, not in restic diff or restic stats [snapshot-id], or am I wrong?)


#9

Yes, though the files will still be deduplicated.

Restic uses a prior (“parent”) snapshot to optimize the backup process. If a file has the same path, size, and mtime as in the parent snapshot, the file is assumed to have been unchanged. If this is not the case, the file contents are read, chunked, and each chunk is stored in the repository if it doesn’t already exist in the repository.

The parent snapshot is automatically selected as the most recent snapshot that has exactly the same hostname and path set as the current backup. If there is no prior backup with the same hostname and path set then no parent snapshot is used and all files must have their contents scanned. However, the parent snapshot can be manually overridden with the --parent flag.

I’d need to see the list of snapshots and know which snapshot was created in the second example to be able to tell you if a parent snapshot was used.