Because internet connection on client is slow so i backup locally and sync later to reduce restic runtime
The NAS is not reachable from outside without using of wireguard VPN ( but slow internet connection on client and host )
I want to avoid cloud services as long as i can
But with this setup i have a lot of different repositories without deduplication. What do you think would be a better way for backup ? The repository sizes are currently from 70GB to 1200GB.
Hi @MelcomX , for example “Laptop 1” is from my mum and the Laptop is mostly only running for checking stock market or emails. So the “on-time” is very short. I want to avoid that the Laptop/PC is shutdown (or battery empty) before the backup is complete.
What I was thinking (please do not roll this into production as is): Technically you could create a huge repo sync it to all clients, let them back up to that local cache and let it copy back, without deleting not existing stuff back.
Another option is to create the repos with the same parameters and then copy that snapshots to your main repo instead of syncing the repos to the NAS.
But maybe you do not have huge gains due to deduplication, since the stuff saved on your hard disks is disjunct, and then all the effort you invest could be not worth the gains. Therefore I think the current setup may be the best available.
DISCLAIMER: All things I said here shall be tested first before being deployed. And I’m open to corrections.
Interesting, but I see the following problem: Since the speed of restic mostly comes from not needing to read unchanged files since it knows stuff from parent snapshots, deleting and creating a new repo is giving up that speed advantage. This means that backing up is slower than if you have the last snapshot available.
Ok, thats right. So if i follow this idea i should only remove the oldest snapshot and not the whole repository. Exactly one snapshot must be left ( latest one i ).
Perhaps I’m missing something, but could you not just replace the “syncthing” in your setup with a restic copy operation, direct to a single large repository on the NAS?
The above said, as MelcomX already mentioned, I think it’s important to test if the backups from each of your clients will actually deduplicate by a significant amount against one another. Unless you’re backing up a lot of files common to multiple clients, you may find that the amount of deduplication between the clients is not that large, in which case your current setup would be better.
There is one more aspect here. Syncthing is terrible way to keep two repos in sync. It does not know anything about repo structure. Every time it runs until it finishes destination repo is potentially in totally broken state. Not even mentioning what if by mistake some files are deleted from source repo. Or corrupted by some malicious process etc. They will be synced:)
restic copy always maintains destination repo coherence. It can be stopped mid way and restarted later etc.
@shd2h To copy direct i think i would need a VPN connection into my NAS-network or a open port . With syncthing i do not need any of these two options. In my point of view this means more security for my network. The deduplication ratio is a good point, you are right. It makes sense to test the ratio before investing more time. Thank you !
@kapitainsky As already written to shd2h how would you execute the “restic copy” between the two networks ? I only see VPN, PortForwarding. Im open for new ideas/input.
Your point about a possible broken repo is understandable.
Yes if you want to go this way with your NAS as a central repositories storage than you have to configure all needed network access and security. Effectively building your own cloud solution. It is actually not so difficult. There is no need for VPN when you expose specific service only like e.g. sftp or S3 minio.
Using syncthing to sync repos has flaw in basic concept design. People do all sort of crazy things:) It is your data at the end.
Syncthing isn’t necessarily the worst way of synchronising two repositories. A cursory google suggests it does use checksums when confirming which data to transfer from A → B, so it seems to me that it is no worse than just copying a repository by using other non-restic tools.
Obviously, as kapitansky explained, it’s less preferable to restic copy though, because copy understands the repository structure, and you should be performing regular checks of the target repository to make sure nothing gets broken by syncthing.
If you want to deduplicate multiple clients effectively, you need to let the clients access a central repository somehow. Oh and you do need to make sure that all repositories were initiated with the same chunker params also, or else they won’t deduplicate.
If the clients can already use syncthing to communicate to the NAS, it implies there is already some connectivity there. Can you not just extend this to allow the port(s) necessary for the laptops to communicate directly to repository on the NAS?
Restic supports storing backups from multiple computers in one repository out of the box. This is just an attempt how to make something simple more complicated than it really is:)
As i wrote in my first post the internet connection for direct backup is to slow in my point of view ( only DSL6000 rural village in germany ). Syncthing is always running in the background and is syncing step by step ( continue operation after reboot etc. ).
The hint with the chunker params is very good. I didnt even know that this exist .
I think i set this topic to resolved because i got a lot of good ideas of all the contributing users here. I now know the limits and possibilities and have to test if anything fits better for me as the syncthing option does at the moment. Thank you all !
Edit: You can go on with the discussion if you want