Can anyone explain what the restic backup does? I have a collection of large files (~2GB each) and the first backup has completed fine. But when I run the backup again, it takes quite a bit of time to process the files, even though they have not changed at all. There doesn’t appear to be any network traffic, so its definitely not resending the files, but it appears to take some time to chunk them. Is this how restic backup is supposed to work? Surely if the file has not changed, restic should just skip it and not do any work at all?
First of all it would be good to know where your repository is located.
From your post it seems to be a repository which is offsite - NAS, S3, B2 or another cloud vendor.
What would be good to know as well would be the output of two backup commands directly after each other.
This would give us plenty of information to see what restic is doing.
Also the restic command 1:1 you’re using is also helpful for troubleshooting.
Another idea is to check out the forum for other posts in regards to slow backup: https://forum.restic.net/search?q=backup%20slow
There is a ton of information in there that could lead you to a fix for your issue or answers to some of your questions.
I looked at some of the other posts, and they mention a few things, but nothing that solved the problem.
My repository is on linux, backup up to a remote linux server using sftp. I’m using restic 0.9.5 compiled from source: v0.9.5-46-g604b18aa-dirty.
I synchronised the clocks between the machines (some posts mentioned this), but it still appears that restic will scan every file in my repo. I expected restic to be able to check the last modified time of the file, and avoid reading it if nothing has changed.
The problem is that my repository is almost 7TB and it takes hours to check it and it blows up the iowait and makes the machine unresponsive.
This is the output of a backup: “restic -v backup -f --one-file-system --exclude-file=/root/excludes.txt /raid/home”
open repository
repository 3f87a6f5 opened successfully, password is correct
lock repository
load index files
start scan on [/raid/home]
start backup on [/raid/home]
scan finished in 222.579s: 4367733 files, 6.766 TiB
uploaded intermediate index e3dbec0e
uploaded intermediate index 7c34632f
Files: 4367782 new, 0 changed, 0 unmodified
Dirs: 2 new, 0 changed, 0 unmodified
Data Blobs: 583 new
Tree Blobs: 3 new
Added to the repo: 186.598 MiB
processed 4367782 files, 6.766 TiB in 4:55:40
snapshot da49f157 saved
And you can see that only a small number of files have changed and a tiny amount of data is transferred, but it still had to read every file.
Is there a way to tell restic to compare timestamps before scanning the file?
I have no idea why your restic is apparently reading and thinking that all those files are new (not modified), if they haven’t been modified.
I’m thinking you need to isolate things here. A couple of suggestions to try:
Can you reproduce it with a small test set of folders and files? That is, create or copy some dummy files to another part of the disk, preferrably outside the raid (e.g. /tmp), and see if you can reproduce the issue there.
Try setting the noatime option on the mount of the /raid/home filesystem. I can’t say I’m expecting it to help, but just in case. See if the problem still manifests itself.
Also, silly as it may sound, if you run two backups, one right after the other, does the problem happen in both of them or just the first? How much time passed between the first one was started and the second one was started?
Apparently, restic is unable to find a previous snapshot (called “parent snapshot”) for /raid/home, otherwise it would have printed something along the lines of:
using parent snapshot 4ad58bd9
That seems to be missing here.
We’ve had this issue several times already:
Sometimes it was caused by the path (/raid/home here) not being constant. Is the path exactly the same in between backups?
Is the host name always the same?
You can check both by looking at restic snapshots.
You can try forcing restic to use a specific parent snapshot with the flag --parent (and provide the id of the latest snapshot) to see if something changes.
As suggested by @rawtaz, I would try to reproduce it in an smaller scale so then you can debug it quickly.
The “-f” flag forces a complete rescan, which is what restic was doing.
I had copied the command from somewhere without checking all the options.
repository 3f87a6f5 opened successfully, password is correct
lock repository
load index files
using parent snapshot 6912da37
start scan on [/raid/home]
start backup on [/raid/home]
scan finished in 294.869s: 4371314 files, 6.766 TiB
Files: 25 new, 95 changed, 4371194 unmodified
Dirs: 0 new, 2 changed, 0 unmodified
Data Blobs: 126 new
Tree Blobs: 3 new
Added to the repo: 37.993 MiB
processed 4371314 files, 6.766 TiB in 12:23
snapshot bf4bdfef saved
If I remove the “-f”, the backup takes 12mins, and this is more what I expected, and completely fixes the problem.
Thanks for your help!
(you might add an option in verbose mode to indicate that a full rescan is being performed…)
I think: This is reported based on the relevant parent snapshot. By “-f” restic scan operates with no parent snapshot. So in this case during scan everything is “new” kind of “by definition”. Maybe the output can be modified to clarify this. (“based on parent snapshot XY” or alike)
From my understanding this part of the output is a result of the mere scan, not of the whole process.