Continuous backup

There is. But if you want to restore a folder where one file changed later, you have to know to restore the folder first, then the file. It can be a mess.

I was about to, but being a good boy I first searched for similar issues and found one, from a few months ago:

The author pretty much follows through the same rationale that I did, complete with the interim and final conclusions in this post, like he read my mind before I even thought about it. Kudos @alphapapa. Looks like it was accepted as a feature request by @fd0.

2 Likes

Great minds… :slight_smile:

1 Like

Your should take a look at fswatch to trigger restic on file change.

1 Like

Since looking at restic I have had the same thoughts as @alphapapa and @arikb

My performance problem with restic is not the actually backing up of files or dedup or (lack of) compression, but the fact it spends 99% of its time pointlessly scanning 100,000’s of files to notice they haven’t changed since an hour ago :slight_smile: That’s a lot of wasted effort and I/O that continuous solutions like CrashPlan and Carbonite avoid.

restic support backing up a list of specific files (--file-from). So my thought was to have a inotify wrapper/daemon that would build a list of modified files for 15-60 minutes, and the dispatch a restic backup with the accumulated file path list. Then once every 12-24 hours it would run restic with a full scan to ensure nothing is missed (inotify and NTFS events are not guaranteed, they use kernel memory and just get dropped if not picked up in time).

A continuous wrapper would reduce hourly restic backup times from ~30 minutes runtime to ~3 minutes. An greatly reduce the hourly I/O operations on the filesystem.

Has this been done already? Anyone seen a generic inotify wrapper that could do this or be adapted to do this?

I guess a shell script with inotifywait could do it for local filesystems.

1 Like

Just tried this as my home folder backups are very slow even for a few changed files.

Using fswatch on MacOS I simply piped the files to a logfile and used that as input to --files-from when backing up with restic.

Seems to work fine. Just make sure you’re not piping to the logfile while restic is running as restic only seems to check for file existence initially, and will fail if any other files are added by fswatch later, that don’t exist when restic tries to back them up (crash with lstat /path/to/file: no such file or directory). I just rotated the log and ran restic on the previous logfile.

I could not get restic to apply my exclude file using --exclude-file when used together with --files-from so for this to work you’d have to parse the logfile manually and remove stuff you don’t want. Actually the exclude file seems to work. It’s just that it still lists it in the snapshot (because it’s part of the backup command I guess), but the actual path is empty when I try to restore.

So it seems like a possible but maybe a bit of a hacky solution. Not sure it will scale to many files. Also your snapshots will look like crazy trying to list all the individual files backed up in the snapshot. :slight_smile:

It’s definitely fast though. I’m a bit paranoid, so I back up my home folder once per hour. Normally this takes about 50 minutes for a full scan, plus upload. Using the fswatch hack above cut that down to 9 seconds for an hour’s worth of changes.

I know it’s not surprising given what’s happening. Probably just an indication that my current naive restic setup is not ideal.

The only thing preventing me from using it at this point is the horrible snapshot log I get… :stuck_out_tongue: (Using restic snapshots --compact helps with this)

Edit: I should note that I have millions of data files that I backup. Some of them change often while others never change - so I guess I should only backup the static ones once in a while.

On second thought this wont work well because there’s no meaningful parent snapshot to use. So deduplication won’t work and finding a backed up version of a file will require you to look through all snapshots - not just the latest.

Deduplication will still work, it’s done on a block level.

You can use restic find to look for files you want to restore.

But yes, it’s a bit of an issue that you don’t see your entire filesystem/tree in your snapshots, it’s much messier to restore stuff with this approach.

I can’t help but think that if it takes 50 minutes for a full scan, there’s something very slow with your filesystem/disks?

Ahh good point about the deduplication. Still think this feels too much like a hack to really use in practise. But the idea is nice.

Not sure why my backups are taking so long. Using a MacBook Pro from earlier this year with a 500 GB SSD. I guess I just I need to work some more on my exclude files.

For now I’ve changed by schedule to backup my working dir once per hour (6 min) and then the whole home folder two times per day (1 hour). I find similar times on other computers that I back up (all MacOS).

I like the idea of continuous backup, but not sure if it really fits the philosophy of restic at the moment (but I might be wrong).

Edit: Think I might have found the reason for my slow backups: Restic runs lstat on files excluded by extension - intended behaviour?

I can’t help but think that if it takes 50 minutes for a full scan, there’s something very slow with your filesystem/disks?

Per-file speed will vary, from your local 400,000 IOPS SSD to your 200 IOPS hard disk to your 200 IOPS high-latency network file system. But the problem is scan time is linear with with number of files, so there will always be a number of files that will make restic slow. If you have many millions of files, there is not something wrong with your filesystem that restic is ‘slow’ :slight_smile:

But yes, it’s a bit of an issue that you don’t see your entire filesystem/tree in your snapshots, it’s much messier to restore stuff with this approach.

Very good point @askielboe @rawtaz! That’s a huge fly in my continuous-wrapper ointment :cry:

I see your point. I’m not sure what would be classified as “normal” scan time for X number of files and Y number of directories on an SSD.

On one of my systems, a MacBook Pro (Early 2015) the scan looks like this:

scanned 109233 directories, 631102 files in 0:41

That is not even one million files, indeed, but if it’s linear as you say and I were to have five million files, the scan would still take just about five minutes.

With that math on the same system I’d need to have around 50 million files for the scan to take 50 minutes.

@askielboe Would you mind showing the “scanned …” line of your output when the scan takes 50 minutes?

All this said, there’s of course lots of other variables involved in the scanning speed. I’m just surprised to hear 50 minutes scan time on a laptop with a fast SSD :slight_smile:

So here is one where it scanned about a million files in 54 minutes (the scan time varies a bit with the load):

scan finished in 3243.814s: 1055943 files, 139.734 GiB

I’ve since then added a bunch more to my exclude file which cuts the scan time down quite a bit. I think the main culprit is a lot of protobuf files which I ignore(d) using *.pb. This yields scan times of ~ 30 min (including the other excludes I’ve added):

scan finished in 1789.684s: 756149 files, 93.377 GiB

If I instead exclude the path that contains the .pb-files I get a scan time of around 5 minutes:

scan finished in 286.896s: 757815 files, 93.381 GiB

So avoiding filename wildcard excludes (and extending my exclude file in general) seems to have fixed the issue for now.

Since this is a bit off topic, and because I already created a new forum post, I thought I’d just post the reply over here instead: Restic runs lstat on files excluded by extension - intended behaviour? - #2 by askielboe

It could be an SSD stuck the other side of SATA controller? Which would massively impact the speed. 50 minutes for a local NVMe SSD would sound slow, as normally you could read every byte of a whole 1TB NVMe SSD in a fraction of that time :slight_smile:

I also find Restic very slow for incremental backups. It does a lot of disk reading given my file system is close to static. Reading every file to see if a hash has changed is really inefficient - thorough but inefficient. Looking at the file modified date and only checking files changed since the last backup could reduce backup time by orders of magnitude for rarely changing sets of files - which is probably most backups.

On EC2 servers you have a burst balance for disk use, and running a backup that reads every file in your backup set could easily use up most or all of your disk credit. That would leave your production workloads running slowly. It’s the “ebs burst balance”.

Even on a dedicated server or home PC this is inefficient and slows down regular computer use.

I’d like to change from Borg to Restic because of a few problems with Borg, but I don’t really want to have my server or PC having to read GB or TB of data daily when I’ve usually changed about 10 files totaling about 20MB.

I really like Restic, and hope that one day I can use it as my primary backup program. For now I think I’ll continue to use it for weekly or monthly backups, but I don’t think it’s suitable for daily or more frequent backups.

Hi @tomwaldnz, restic already only checks the modified date (and not even the size) for repeat backups of the same file. But it does all the file checks linearly, one file at a time. So most of the backup time is just wasted/idle time waiting for file stat calls to return. Hence is it also much slower on high latency filesystems.

https://restic.readthedocs.io/en/latest/040_backup.html
“When you backup the same directory again (maybe with new or changed files) restic will find the old snapshot in the repo and by default only reads those files that are new or have been modified since the last snapshot. This is decided based on the modify date of the file in the file system.”

The scan is done as a linear task, and the linear file stat process starts in parallel. But each task is based on a linear algorithm right now, that scales linearly. It gets twice as slow if you latency it twice as long, and twice as slow if you have twice as many files.

The slowness is because restic usually can’t utilize all the available filesystem bandwidth and/or network bandwidth. The new restore will make a that not true for restores, which will be better than linear in the next release.

1 Like

Thanks @whereisaaron, based on your information I’ve done some research and worked out why I was getting excessive disk access. It turns out my virus scanner, Avira, was virus scanning every file that Restic wanted to back up even if it was unchanged. I guess that’s either a bug or a feature. Once I disabled that it was quite fast.

More in this thread I created.

This is usually not a bug but a feature of the AV. Definitely follow the advise posted there in the thread you made.
But keep in mind that any exclusion comes with a cost.

That’s correct, and for most cases it’s pretty fast! :slight_smile:

Ah, thanks for pointing it out, it’s a bit more complicated. Here’s the code which decides if a file needs to be re-read:

You can see it checks:

  • The file type (was it a file before, and is now a symlink?)
  • The modification time
  • The file size
  • The file inode (which makes restic re-read files on fuse-based file systems like sshfs)

I’ve updated the documentation: Backing up — restic 0.16.3 documentation

2 Likes

Absolutely. My initial test suggested that excluding the restic process from scanning didn’t work, but I had another go and it took. Either I did it wrong or it needed a reboot to take effect.