Continuous backup

It sounds like you want a sync service more than a backup service. Like, Dropbox or Google Drive. At least, that would be easier, no?

Nope. I want a backup service:

  • I don’t need to sync to a different device

  • I want different versions of every file that I change, and access to files I deleted

  • I don’t want to pay sync prices, backup back-ends are much cheaper

  • I want the backup to be encrypted on the back end, most sync services don’t allow that

I was happy with CrashPlan, which had the continuous backup feature.

More concisely: I’m very happy with what restic offers. The only thing I’m missing is the ability to backup frequently without the penalty of scanning all files.

You have some good points, I can appreciate that.

It’s worth noting that you can use a sync service with a single device, and that most sync services I’ve used track changes and let you see different revisions of a file, including deleted ones.

You are right about backup storage usually being cheaper than sync storage, though.

For what it’s worth, you can tell restic to back up just one file. All you have to do is wrap inotify over restic and invoke backup on the file that changed.

In the next release of restic (currently on master) using the --quiet argument will skip the initial scan. So maybe give that a try when it’s released. It will still have to work through all your files of course, but as the initial scan is only used for progress estimation, it’s not needed when running in quiet mode.

Source: restic/changelog/unreleased/pull-1676 at 6eb1be0be477b4d9064f5c49558a4ca768dd54aa · restic/restic · GitHub

1 Like

That is exactly what I was asking - if I can integrate it with inotify somehow. It’s not just the ability to select files to backup - it’s also whether the high number of snapshots would impact performance over say years of operation.

Also, having snapshots of individual files will make it really difficult to forget snapshots - current forget mechanism rely on the frequency of snapshots. If I backup /home/user every hour, after 7 days I can forget all but the 1st one of each day for example. If I backup individual files - I get a snapshot containing only /home/user/file1 and another snapshot containing only /home/user/file2 etc. so I not only have to back individual files, I have to somehow create a new snapshot that contains all the files of the previous snapshot of the same root directory (I’m guessing restic uses pointers to existing blobs), except for the files that were changed.

Now that I had to write it down to explain it I realise that this will require code changes… Is there a way to suggest new features?

I will definitely use that one, thanks.

I haven’t tried, but isn’t there restic find for this sort of thing, to find a file in a recent snapshot? Because if that worked, and if having lots of little snapshots wasn’t a bad thing, you could use find to locate the file(s) you want and then restore from the snapshot it’s in.

You can open an issue on GitHub: Issues · restic/restic · GitHub.

There is. But if you want to restore a folder where one file changed later, you have to know to restore the folder first, then the file. It can be a mess.

I was about to, but being a good boy I first searched for similar issues and found one, from a few months ago:

The author pretty much follows through the same rationale that I did, complete with the interim and final conclusions in this post, like he read my mind before I even thought about it. Kudos @alphapapa. Looks like it was accepted as a feature request by @fd0.

2 Likes

Great minds… :slight_smile:

1 Like

Your should take a look at fswatch to trigger restic on file change.

1 Like

Since looking at restic I have had the same thoughts as @alphapapa and @arikb

My performance problem with restic is not the actually backing up of files or dedup or (lack of) compression, but the fact it spends 99% of its time pointlessly scanning 100,000’s of files to notice they haven’t changed since an hour ago :slight_smile: That’s a lot of wasted effort and I/O that continuous solutions like CrashPlan and Carbonite avoid.

restic support backing up a list of specific files (--file-from). So my thought was to have a inotify wrapper/daemon that would build a list of modified files for 15-60 minutes, and the dispatch a restic backup with the accumulated file path list. Then once every 12-24 hours it would run restic with a full scan to ensure nothing is missed (inotify and NTFS events are not guaranteed, they use kernel memory and just get dropped if not picked up in time).

A continuous wrapper would reduce hourly restic backup times from ~30 minutes runtime to ~3 minutes. An greatly reduce the hourly I/O operations on the filesystem.

Has this been done already? Anyone seen a generic inotify wrapper that could do this or be adapted to do this?

I guess a shell script with inotifywait could do it for local filesystems.

1 Like

Just tried this as my home folder backups are very slow even for a few changed files.

Using fswatch on MacOS I simply piped the files to a logfile and used that as input to --files-from when backing up with restic.

Seems to work fine. Just make sure you’re not piping to the logfile while restic is running as restic only seems to check for file existence initially, and will fail if any other files are added by fswatch later, that don’t exist when restic tries to back them up (crash with lstat /path/to/file: no such file or directory). I just rotated the log and ran restic on the previous logfile.

I could not get restic to apply my exclude file using --exclude-file when used together with --files-from so for this to work you’d have to parse the logfile manually and remove stuff you don’t want. Actually the exclude file seems to work. It’s just that it still lists it in the snapshot (because it’s part of the backup command I guess), but the actual path is empty when I try to restore.

So it seems like a possible but maybe a bit of a hacky solution. Not sure it will scale to many files. Also your snapshots will look like crazy trying to list all the individual files backed up in the snapshot. :slight_smile:

It’s definitely fast though. I’m a bit paranoid, so I back up my home folder once per hour. Normally this takes about 50 minutes for a full scan, plus upload. Using the fswatch hack above cut that down to 9 seconds for an hour’s worth of changes.

I know it’s not surprising given what’s happening. Probably just an indication that my current naive restic setup is not ideal.

The only thing preventing me from using it at this point is the horrible snapshot log I get… :stuck_out_tongue: (Using restic snapshots --compact helps with this)

Edit: I should note that I have millions of data files that I backup. Some of them change often while others never change - so I guess I should only backup the static ones once in a while.

On second thought this wont work well because there’s no meaningful parent snapshot to use. So deduplication won’t work and finding a backed up version of a file will require you to look through all snapshots - not just the latest.

Deduplication will still work, it’s done on a block level.

You can use restic find to look for files you want to restore.

But yes, it’s a bit of an issue that you don’t see your entire filesystem/tree in your snapshots, it’s much messier to restore stuff with this approach.

I can’t help but think that if it takes 50 minutes for a full scan, there’s something very slow with your filesystem/disks?

Ahh good point about the deduplication. Still think this feels too much like a hack to really use in practise. But the idea is nice.

Not sure why my backups are taking so long. Using a MacBook Pro from earlier this year with a 500 GB SSD. I guess I just I need to work some more on my exclude files.

For now I’ve changed by schedule to backup my working dir once per hour (6 min) and then the whole home folder two times per day (1 hour). I find similar times on other computers that I back up (all MacOS).

I like the idea of continuous backup, but not sure if it really fits the philosophy of restic at the moment (but I might be wrong).

Edit: Think I might have found the reason for my slow backups: Restic runs lstat on files excluded by extension - intended behaviour?

I can’t help but think that if it takes 50 minutes for a full scan, there’s something very slow with your filesystem/disks?

Per-file speed will vary, from your local 400,000 IOPS SSD to your 200 IOPS hard disk to your 200 IOPS high-latency network file system. But the problem is scan time is linear with with number of files, so there will always be a number of files that will make restic slow. If you have many millions of files, there is not something wrong with your filesystem that restic is ‘slow’ :slight_smile:

But yes, it’s a bit of an issue that you don’t see your entire filesystem/tree in your snapshots, it’s much messier to restore stuff with this approach.

Very good point @askielboe @rawtaz! That’s a huge fly in my continuous-wrapper ointment :cry:

I see your point. I’m not sure what would be classified as “normal” scan time for X number of files and Y number of directories on an SSD.

On one of my systems, a MacBook Pro (Early 2015) the scan looks like this:

scanned 109233 directories, 631102 files in 0:41

That is not even one million files, indeed, but if it’s linear as you say and I were to have five million files, the scan would still take just about five minutes.

With that math on the same system I’d need to have around 50 million files for the scan to take 50 minutes.

@askielboe Would you mind showing the “scanned …” line of your output when the scan takes 50 minutes?

All this said, there’s of course lots of other variables involved in the scanning speed. I’m just surprised to hear 50 minutes scan time on a laptop with a fast SSD :slight_smile:

So here is one where it scanned about a million files in 54 minutes (the scan time varies a bit with the load):

scan finished in 3243.814s: 1055943 files, 139.734 GiB

I’ve since then added a bunch more to my exclude file which cuts the scan time down quite a bit. I think the main culprit is a lot of protobuf files which I ignore(d) using *.pb. This yields scan times of ~ 30 min (including the other excludes I’ve added):

scan finished in 1789.684s: 756149 files, 93.377 GiB

If I instead exclude the path that contains the .pb-files I get a scan time of around 5 minutes:

scan finished in 286.896s: 757815 files, 93.381 GiB

So avoiding filename wildcard excludes (and extending my exclude file in general) seems to have fixed the issue for now.

Since this is a bit off topic, and because I already created a new forum post, I thought I’d just post the reply over here instead: Restic runs lstat on files excluded by extension - intended behaviour? - #2 by askielboe

It could be an SSD stuck the other side of SATA controller? Which would massively impact the speed. 50 minutes for a local NVMe SSD would sound slow, as normally you could read every byte of a whole 1TB NVMe SSD in a fraction of that time :slight_smile: