Multiple hosts on different location best practice

ElgatoBavaria · August 5, 2023, 7:49pm

Hi guys, currently i have following setup:

Syncthing:

Because internet connection on client is slow so i backup locally and sync later to reduce restic runtime
The NAS is not reachable from outside without using of wireguard VPN ( but slow internet connection on client and host )
I want to avoid cloud services as long as i can

But with this setup i have a lot of different repositories without deduplication. What do you think would be a better way for backup ? The repository sizes are currently from 70GB to 1200GB.

Best regards, Mathias

MelcomX · August 7, 2023, 7:59am

Hi Mathias

Is there a reason, why you want to reduce restic runtime?

ElgatoBavaria · August 7, 2023, 8:23am

Hi @MelcomX , for example “Laptop 1” is from my mum and the Laptop is mostly only running for checking stock market or emails. So the “on-time” is very short. I want to avoid that the Laptop/PC is shutdown (or battery empty) before the backup is complete.

kapitainsky · August 7, 2023, 10:02am

You can not have both. At the same time it does not matter if backup is completed or not. It will finish next time.

MelcomX · August 7, 2023, 1:20pm

What I was thinking (please do not roll this into production as is): Technically you could create a huge repo sync it to all clients, let them back up to that local cache and let it copy back, without deleting not existing stuff back.

Another option is to create the repos with the same parameters and then copy that snapshots to your main repo instead of syncing the repos to the NAS.

But maybe you do not have huge gains due to deduplication, since the stuff saved on your hard disks is disjunct, and then all the effort you invest could be not worth the gains. Therefore I think the current setup may be the best available.

DISCLAIMER: All things I said here shall be tested first before being deployed. And I’m open to corrections.

ElgatoBavaria · August 7, 2023, 2:33pm

First of all, thank you for your constructive ideas

Opt1: The biggest repo has 1.2 TB at the moment. Rolling this repo out to the laptop of my mum makes no sense to me.

Opt2: Interesting idea to make it work on snapshot layer. I just have another idea based on yours:

NewIdea:

Everytime a new backup is created on remote devices ( not in NAS network ) a new repository is created
After backup there is a repo with only one snapshot locally
Sync this repo to the NAS via Syncthing
Merge this repo with the “main” repository of the NAS
Delete repo (with only one snapshot) locally and on NAS

→ Advantages of this option would be:

Fast backup locally
Deduplication on NAS
Less HDD usage on NAS / Locally
@MelcomX What do you think about this option ?

MelcomX · August 7, 2023, 3:02pm

Interesting, but I see the following problem: Since the speed of restic mostly comes from not needing to read unchanged files since it knows stuff from parent snapshots, deleting and creating a new repo is giving up that speed advantage. This means that backing up is slower than if you have the last snapshot available.

ElgatoBavaria · August 7, 2023, 3:17pm

Ok, thats right. So if i follow this idea i should only remove the oldest snapshot and not the whole repository. Exactly one snapshot must be left ( latest one i ).

shd2h · August 7, 2023, 8:18pm

Perhaps I’m missing something, but could you not just replace the “syncthing” in your setup with a restic copy operation, direct to a single large repository on the NAS?

The above said, as MelcomX already mentioned, I think it’s important to test if the backups from each of your clients will actually deduplicate by a significant amount against one another. Unless you’re backing up a lot of files common to multiple clients, you may find that the amount of deduplication between the clients is not that large, in which case your current setup would be better.

kapitainsky · August 7, 2023, 8:44pm

There is one more aspect here. Syncthing is terrible way to keep two repos in sync. It does not know anything about repo structure. Every time it runs until it finishes destination repo is potentially in totally broken state. Not even mentioning what if by mistake some files are deleted from source repo. Or corrupted by some malicious process etc. They will be synced:)

restic copy always maintains destination repo coherence. It can be stopped mid way and restarted later etc.

ElgatoBavaria · August 8, 2023, 6:46am

@shd2h To copy direct i think i would need a VPN connection into my NAS-network or a open port . With syncthing i do not need any of these two options. In my point of view this means more security for my network. The deduplication ratio is a good point, you are right. It makes sense to test the ratio before investing more time. Thank you !

@kapitainsky As already written to shd2h how would you execute the “restic copy” between the two networks ? I only see VPN, PortForwarding. Im open for new ideas/input.
Your point about a possible broken repo is understandable.

kapitainsky · August 8, 2023, 10:30am

Yes if you want to go this way with your NAS as a central repositories storage than you have to configure all needed network access and security. Effectively building your own cloud solution. It is actually not so difficult. There is no need for VPN when you expose specific service only like e.g. sftp or S3 minio.

Using syncthing to sync repos has flaw in basic concept design. People do all sort of crazy things:) It is your data at the end.

shd2h · August 8, 2023, 10:59am

Syncthing isn’t necessarily the worst way of synchronising two repositories. A cursory google suggests it does use checksums when confirming which data to transfer from A → B, so it seems to me that it is no worse than just copying a repository by using other non-restic tools.

Obviously, as kapitansky explained, it’s less preferable to restic copy though, because copy understands the repository structure, and you should be performing regular checks of the target repository to make sure nothing gets broken by syncthing.

If you want to deduplicate multiple clients effectively, you need to let the clients access a central repository somehow. Oh and you do need to make sure that all repositories were initiated with the same chunker params also, or else they won’t deduplicate.

If the clients can already use syncthing to communicate to the NAS, it implies there is already some connectivity there. Can you not just extend this to allow the port(s) necessary for the laptops to communicate directly to repository on the NAS?

kapitainsky · August 8, 2023, 1:03pm

Restic supports storing backups from multiple computers in one repository out of the box. This is just an attempt how to make something simple more complicated than it really is:)

ElgatoBavaria · August 8, 2023, 3:08pm

As i wrote in my first post the internet connection for direct backup is to slow in my point of view ( only DSL6000 rural village in germany ). Syncthing is always running in the background and is syncing step by step ( continue operation after reboot etc. ).
The hint with the chunker params is very good. I didnt even know that this exist .

@kapitainsky I get your point.

I think i set this topic to resolved because i got a lot of good ideas of all the contributing users here. I now know the limits and possibilities and have to test if anything fits better for me as the syncthing option does at the moment. Thank you all !

Edit: You can go on with the discussion if you want