Optimizing large backups

lazyfrosch · March 19, 2021, 11:33am

Hey all,

I’m currently trying out restic to backup data of my storage server onto cloud storage, and I’m wondering about how to optimize the scan of data.

So far the backup wasn’t completed yet, its resuming when the SSH or DSL connection drops.

Data Size: 628 GB
Files: 247,866
Target: SFTP
Upload: 40MBit/s DSL Line

While I know, transfer will take some time, I experience problems during scan of changes, it takes a while for all files to be scanned, and then the backup resumes uploads.

I suppose restic is checksuming every file. Is there a way to prefer mtime/size/inode during the scan of files?

Thank you
Markus

alexweiss · March 19, 2021, 2:37pm

You can try the experimantal PR

MichaelEischer · March 20, 2021, 6:14pm

restic uses the last completed backup as starting point to speed-up later backups. So once you’ve complete the first backup it should be a lot faster. To automatically resume an incomplete (initial) backup there’s currently only the referenced experimental PR available.

However, you could also manually start with a smaller set of folders first, then use that as the starting point for a second backup job which includes a few additional folders and so on.
When you have for example two folders A and B then you could do the following: restic backup A this then creates a snapshot with id 357375ab (the real ID will be different) and use that as starting point for the second backup job: restic backup --parent 357375ab A B. That will allow restic to quickly check that all files in folder A are unchanged and the continue with uploading folder B. Afterwards just delete the intermediate snapshots: restic forget 357375ab.