Running Restic with Rclone VFS Mount

amit · November 10, 2024, 4:25pm

I’m using Restic with Rclone as a backend. I want to configure Rclone with VFS capabilities so Restic can continue scanning files without waiting for uploads to complete.
I tried running serve with the VFS flag, but it didn’t work (I checked the code and saw the flags aren’t defined). I assume this option is disabled to ensure proper functioning of the locking mechanism and deduplication.
Would it be possible to run Rclone as a mount point to utilize VFS, assuming I can guarantee atomic operations (i.e., ensuring no parallel connections)?

amit · November 18, 2024, 4:52pm

Anyone has an answer?

akrabu · November 18, 2024, 6:15pm

Honestly… I don’t think I’d personally want to do that. It adds an unnecessary layer of complexity, and introduces new ways the repository could be corrupted. With VFS, the snapshot files could be uploaded before the data files are uploaded, for instance. That wouldn’t be good. I don’t think you’re going to find anyone else thinking this is a good idea lol

You could mount via Rclone with VFS, I just don’t think you should lol

What do you view as the net benefit from doing this? The “scan” does complete before the uploads finish as it is. You can also disable the scan entirely by doing --no-scan. I very much doubt that mounting via Rclone VFS will speed the overall process up. Restic itself may finish sooner, but I expect the net result to be the same or longer - and possibly lead to corruption should Rclone VFS glitch out in the upload process (which I’ve seen more often than you might expect).

amit · November 18, 2024, 7:02pm

Even if the repository will be accessed only from that instance until rclone will finish to upload everything?
My issue is first time backup of slow hdd which I can’t finish at one time. I want to scan the hdd with restic only once (only the scanning can take hours), and then rclone can continue after restart to upload only the relevant blobs without scanning anything.
Why it is bad if the snapshots will be uploaded before the blobs assuming no other restic command will start until everything is finished to upload?

kapitainsky · November 18, 2024, 7:34pm

There is better solution IMO. If you have enough space for VFS cache to store all repo then much better would be simply to create local repo instead. And when finished copy it to your remote location (using rclone for example, especially it sounds you are familiar with it already) - it can be then done at whatever slow pace with as many breaks as needed.

amit · November 18, 2024, 7:58pm

I thought about this but unfortunately it is not a new repo, just new backup from a new computer.
They are contains similar files and that is the reason I want to use same repo.
Thanks for your helping

kapitainsky · November 18, 2024, 8:05pm

You could still do it. Use rclone union remote merging some local folder and your cloud storage with latter marked as nc (no create). This way all new restic repo files will be created locally. Then you sync all to cloud when finished.

This method assumes that your repo is not used by any other computer.

akrabu · November 18, 2024, 10:45pm

Ah. You’re confusing “scan” with the backup process itself. The “scan” feature is what estimates the size of the backup and remaining time, which you can turn off with --no-scan. What you’re talking about is needing to be able to resume the backup process itself (not the scan), when it has failed midway. Which isn’t possible yet, as I figure you know - so you think that backing up to Rclone VFS, and letting Rclone continue uploading in the background, might work around this issue. Intriguing idea, in theory.

But there’s one assumption you’re making that isn’t guaranteed - that Rclone VFS, by itself, will continue, and successfully finish, when Restic+Rclone can’t. I have absolutely lost files when doing large VFS uploads (without Restic) and Rclone either stalled or crashed. All you’d be doing is moving the fail point from where you can see it, to where you might not notice it. And THAT is why it could be bad if the snapshot file was uploaded first, and some of the blobs were missing and not uploaded (or worse, partially uploaded) via the VFS cache. I would not trust this method at all.

The correct solution is to split the backup job into portions. Say you have 10 folders. Maybe backup two at a time. Folders 1 & 2, then let that finish. Folders 3 & 4, etc. When you get the whole thing up there, and it’s all good to go… THEN do a full backup (you can reference one of the previous backups with --parent to skip at least SOME of the re-reading). This way you already have everything in the cloud, and Restic will just need to re-read everything for the full snapshot (minus whichever --parent you reference), and won’t get hung up uploading anything because it’s all already there. Once it finishes scanning all the files, realizes everything is already up there, it’ll just save the snapshot - and you’ll be done.

Alternatively, put it on a mobile drive, take it somewhere where you have a faster connnection, and upload from there. That’s what I end up doing for 1TB+ jobs (my internet at home is horrendously slow).

Another solution would be the “copy” command, because it IS resumable. So if you make a local snapshot, then use the copy command… if it fails, upon retrying it will only copy the packs that aren’t already uploaded. I often do it like this:

restic copy <snapshot-ID> || restic copy <snapshot-ID> || restic copy <snapshot-ID> || restic copy <snapshot-ID> || restic copy <snapshot-ID> || restic copy <snapshot-ID> || restic copy <snapshot-ID> || restic copy <snapshot-ID> || restic copy <snapshot-ID> || restic copy <snapshot-ID> && restic check

That will start a copy, and if it fails (up to 10 times), it will start the copy again, and again, and again - until it succeeds. When it succeeds, it’ll automatically run a restic check, too. I have done this many, many times, and never seen it cause corruption.

Be sure to set $RESTIC_REPOSITORY, $RESTIC_PASSWORD, $RESTIC_FROM_REPO, and $RESTIC_FROM_PASSWORD before running all that - don’t want it prompting for the repo locations or passwords every time

Ps. If you create a brand new repository for this, be sure to use init --copy-chunker-params when creating it, and reference your original repository, so the chunker settings will match what you’re already using

amit · November 18, 2024, 11:19pm

Thank you for the detailed response. I didn’t means the scanning for the estimation and I know it can be disabled but the rest of your explanation answered my question perfectly. I must admit, I’m quite surprised and concerned to learn that rclone can potentially messed up backups like this.
Again thank you for helping

akrabu · November 18, 2024, 11:21pm

Yeah… I don’t trust the VFS cache completely. It’s not perfect. It works most of the time but on connections that seem to time out a lot, it often gets hung up, in my experience. Also if I run it as a daemon, I’m so ADHD I might forget about it and turn off the computer before it’s done… but that’s more a me problem haha