Backup architecture in practice and other considerations

Well, depending on the choice of N --read-data and --read-data-subset are equal :wink:

What I meant was that it does not seem to make sense to run check without actually reading at least parts of the data, because that is what actually matters and what is much more likely to fail. Also since there seems to be no recommendation on the timescale I could just read the whole data set once a year instead of small chunks everytime.

Though I will probably opt for a partial read-check after every backup.

I have spent some more thoughts on the question how to best set up the second backup. I will document them here because I have not found them elsewhere. In total I found 4 ways:

  1. Loop

Run the same backup script with another endpoint location.

+ Simplest setup
o Repos diffable and even identical if they are not changed during the backup process
- Single machine has access to all backups

  1. Clone

1st backup is cloned/copied to another endpoint location.

+ Simple setup
+ Identical = perfectly diffable
- Errors can propagate (risk can be mitigated with append only/immutable modes?)

  1. Backup^2

Create a backup of the backup.

+ Fairly simple setup (depending on the details)
+ 2nd backup can be run from a different machine → some kind of decoupling
- Repos are not identical at all and diffing/restoring the second backup is harder

  1. Copy Snapshots

Use restic copy to transfer snapshots.

+ Repos almost identical? At least diffable
- Probably the most complicated setup


I was heavily leaning towards the backup square method (which @Nevatar is also using?), because I like the idea of having the second backup being somewhat decoupled from the first. Also it feels kind of natural to use a backup process for that task.

But then I found out that I can probably not run Restic on my old NAS and I’m somewhat hesitant to replace it now. So now I’m not sure anymore if the backup square method is the best. That the backup chain looks like this: Device → NAS → Device → Cloud.

Now I’m also considering the clone approach. If I could manage to ensure that this can happen in an append only/immutable way, that should also ensure that there is a reasonable amount of decoupling, no? Is anyone here using this kind of setup?

1 Like

Methods 2, 3 and 4 will allow you to make “backups” of a repository with unknown errors (most likely hardware induced and undetected). Method 1 would require both repository’s to contain unknown errors (less likely although still possible) before you’re in trouble.

Personally I do:
Device->QNAP NAS (restic, everything)
Device->Cloud (restic, all except large data sets I can live without)
Device->OMV/Thecus NAS (everything, using rsnapsot) [old NAS]

2 Likes

I would have thought Method 4 would have caught errors 2 & 3 would not. After all, copy needs to be able to read the blobs from the source repository. So if there was corruption of the repository I’d expect the copy operation to fail.

If you can’t run restic/copy operations from the NAS due to resource issues, then that limits your potential options, and it IMO seems simplest to go with option 1. Anything else involves bringing more hardware into the picture, and with backups, I’ve found it it’s often better to have a simpler solution than a more complicated one (less moving parts means less things to go wrong!).

Anecdotally; I used to run check monthly, with check --read-data every 3 months, and found that frequent enough.

2 Likes

The only biggest trouble I have with method 1 is, that malware would probably get access easier to all backups then with a more sequential approach, e.g. Device → NAS → Cloud. But yeah, it probably makes sense. I guess I will try to do my best to achieve this kind of decoupling in software form (e.g. different users etc.).

Are you running a complete check --read-data every three month? I was aiming for once a year at least in the cloud, but I have no real insights in whether is this adequate or not.

Thinking about it, it probably makes sense to time it with the backup length of the cloud provider. For the NAS the frequency can be higher of course.

I was for my local repositories yes, although I’ve since pared that back after several years with no issues. My local repositories are stored atop a btrfs raid1 array that I scrub weekly though, so running regular check --read-data isn’t as important to me as it would be if it were stored on a less resilient storage medium.

For my cloud repository, none of the restic operations are automated, and I have ~1yr as a rough target for how often I check it over. If I used a less reputable cloud provider, I’d check the repository over more often.

I suppose it all depends on your tolerance for risk at the end of the day :slight_smile:

1 Like

I have a quite similar setup, I am running a Synology DS923+ as my main storage server.

  1. My secrets are stored in 1Password. I know, it’s not self-hosted, I used to run KeePass, than KeePassX, than KeepassXC, but the missing sync feature was always annoying to me. Yeah, I can put it in a shared drive or syncthing or dropbox, but even than I ran into conflicts from time to time and I eventually switched to a hosted offering, which works fine for me. Alternativly, I would try Bitwarden.

  2. I would love to use restic on my mobile, but did not find a good solution for it. I ended up using DS File which has a photo backup feature, which syncs all pictures in the background. The client for iOS stops after a few minutes due to iOS limitations, but Android works fine for me. Alternativly, you can self-host GitHub - immich-app/immich: Self-hosted photo and video backup solution directly from your mobile phone., but there is still the “active development” disclaimer on the page, so I preferred synology’s solution. Once I connect to my wireless at home, the backup is automatically started. If I am at a hotel during holidays, I connect home via tailscale and just backup as usual.

  3. I tend to store important docs directly on the NAS and access via sftp or smb. On the diskstation I enabled BTRFS snapshots every 3 hours, so I can go back in time if necessary. The DS runs SHR1 Raid, so I can loose 1 disk without downtime, but that’s no backup, basically just convince of replacing failing drives. The backup of my NAS is done nightly over Tailscale VPN via Hyperbackup to a second NAS at my parents. A third copy of the files is done manually, once a month or after big changes to an old NAS in JBOD mode, which I physically need to plug in, so it is fully offline and can not die during a lightning strike or something similar. As a last safety net, I run restic on the NAS which syncs the most important data and photos via the rclone backend to OneDrive, as my Microsoft 365 subscription contains 1TB storage which I don’t use otherwise. My workstations backup local files via restic to my main NAS, which runs the restic rest server via docker. My windows Laptop uses Active Backup for Business from Synology, as it supports bare metal restores.

  4. I schedule the repo check directly on the NAS via the task scheduler once per week and read 10GB of data, I know that’s not much, but as it fetches everything from OneDrive and the metadata is validated already, I think it’s good enough.

  5. I did not schedule self-updates, but never had real issues after an update either, as the repo format is quite stable. There was a change with compression a few releases back, but even than it did not break existing repos, so I would not be afraid to update.

  6. The synology Support lifetime tend to be quite good in my experience, my offline NAS is a ds213j which was released in 2013 as an entry level NAS. I could still update to DSM 7, not via auto-update, but there was an official release I could upload. If you want to build something on your own, I would consider TrueNAS, but as of now you can’t just add a single drive in future, you need to add another vdev consisting potentially of multiple drives. That’s usually not an issue on bigger deployments, but keep it in mind if you are looking for a low budget home setup.
    I guess for the most home users a Synology NAS as backup target of their workstation/notebook plus an external drive which the NAS is backed up via Hyperbackup is sufficient. But as always, an additional backup is rarely a bad idea :sweat_smile:.

  7. Yes, but I am not that sure if I would use restic for a full disk image. The chroot part should work on Linux/unix, but I think clonezilla or something similar might work more easily. For windows I don’t think restic would work, but I am not sure.

2 Likes
  1. I use BitWarden. Actually, VaultWarden, an open source version of it running on my own server. Passwords are on my phone, my server, my tablet, etc.
  2. SyncThing, again running on my server, backs up Android to my PC. PC is backed up by Android.
  3. I have separate backups. One goes locally to an internal disk daily. One goes to cloud done daily. One goes to a disk in a separate building done monthly. One goes to a friends house done three monthly.
1 Like

Anecdotally; I used to run check monthly, with check --read-data every 3 months, and found that frequent enough.
Thanks for the tip @shd2h

What about using Google Drive? Is “reputable” or not too much?

Why?
I use the portable version of KeePass and the program folder and database are synced with Dropbox and I can have the same password database on multiple computers. I never had a conflict between the versions of the databases.

There’s always Cryptosteel’s Capsule. Marketed more towards crypto wallets, but absolutely useable for our purposes too! Personally I just have mine backed up to Proton Pass. I also have a copy on an Apricorn secure thumb drive, stored inside an encrypted pwSafe database, inside a fire safe… lol

I sync my password database every month or so. Be sure not to let flash memory sit too long. The cell charge tends to dissipate over time. M-Disc would probably be better, but I update mine often so it has time to refresh the cell charge.

I use Dropbox to backup my photos, and Restic to backup my Camera Uploads folder. If you want to use a free Dropbox account, and have limited space… throw Rclone in the mix. Have it move photos out every time the backup script runs, then have Restic immediately back it up. Worst case scenario, even free Dropbox keeps your deleted files for 30 days. :man_shrugging:t2:

Don’t clone. I learned this the hard way. You’ll end up cloning corruption before you notice it. Just look at my most recent thread here. I’d have never, ever noticed this if I was just cloning my repo. Use independent repos. Compare the output / diffs from simultaneously(ish) run backups on occassion. You never know…

Honestly, this one is more at your discretion (aka paranoia level) lol. I do occasional checks. Maybe a full --read-data every 3-6 months.

Also kind of more about your own comfort level. I use Homebrew on my Mac to update Restic, but the same concept applies. I call it at the beginning of my main backup script. And I’ve absolutely been affected by the stray bug or two… but I’d have likely done that manually, too. And by Restic’s design, the absolute worst it would probably do is corrupt THAT snapshot, and not the others. I haven’t seen that happen, but for the most part Restic only appends data, unless you Prune / Rebuild / etc. I’m not too worried about it.

I use a second cloud. And M-Discs, for the super important data (mostly my photo collection and tax documents, for me). With everything RAR’d and PAR2’d to the literal brim of the discs, to boot. 30% parity. And multiple copies, at different locations, of course.

Yep!

2 Likes

I haven’t tried it myself, but a quick search of the forum suggests it is more in the “not too much” category:

If you already have the available google drive storage space, I’d suggest you try it yourself and see if it works for you, and if you can live with it’s quirks. If you’ve not yet paid for any cloud storage though, maybe shop around a bit for other options.

1 Like

In my case, I shared a secret database with my wife, I needed to support for Windows, Linux and Andorid.
Additionally, I was allowed to use KeepassXC on my corporate workstation, but I wasn’t allowed to install any cloud sync software like Dropbox.
So I ended up with syncing the database file manually on my corporate workstation, using Keepass2Android on my mobile and synced the db file via SFTP to my NAS and ran Synology Drive on my private Windows and Linux workstations with KeepassXC.
From the NAS, I took backups of the database.

I am not saying that Keepass is bad software, I was a happy user for years, but as soon as multiple people are editing a single db and are not necessarily always directly syncing back to the main copy, it became more complicated to maintain. Also, in case something would happen to me, it would have been quite a challenge for my relatives to figure out how to access my accounts.
With 1Password, I have an emergency kit printed out and stored in a safe location, standard software and it would be more managable for others to recover without me, in case they need to.

What also made me feel a bit uncomfortable was that Keepass itself did only work on Windows, other unofficial clients existed, but you needed to check if they still are supported. So I used KeepassX on Linux for a few years and found out after a while, that it seem to be no longer activly developed (last release in 2016) and there was a more active fork KeepassXC. Especially that case made me think about, if I would notice, if I am running a vulnerable version of a password manager… my hope is that a single commercial vendor makes that kind of scenarios easier to overcome and their montly fee hopefully helps them to sustain a profitable business.

But as always, there is no silver bullet, if Keepass works for you, I would stick with it, don’t solve a problem if you don’t have one :smile:

1 Like

Thanks for the info and the link.
In the case of a client, yes, I can use the Google Drive that they already have contracted for the classic use of sharing files. That’s why my query, to take advantage of that free space also for backups.

:open_mouth::eyes:

And I was so happy with my KeePass!! :heavy_check_mark::laughing:

Right! :grin::heavy_check_mark:
Thank you very much for the detailed explanation. I work alone and only in Windows so for now I think I’m fine with Keepass.

So, I have finally some time again to reply here. I would like to thank everyone here for the great feedback, it helped me a lot :slight_smile:

In the meantime I have my script up and running (almost completely), so I want to share my results here. In the end, I now used the

method. Meaning that I do the backups independently to my NAS and the Cloud. That seemed like the most straightforward way in the end. Let’s see how that develops.

My Keypass Safe is so far distributed over a few devices and I plan to print out the most critical ones.

I have currently opted for check --read-data-subset=10% for the cloud and 20% for the NAS on every backup. I might reduce that later, though.

Currently I have opted against a third backup endpoint (apart from NAS and Cloud), but I still have not given up on that. I will probably sooner or later use an old USB HDD and create sporadically backup there and put the HDD into the basement of some faraway relatives. Let’s see.

2 Likes

Curiously, I do the same. When I go to my second home, once a month or so, I launch a backup to an external drive. Just in case.

Same here. I store the drive at work.

1 Like