Backup architecture in practice and other considerations

esd_event · August 16, 2023, 8:22am

Hi all,

I’m just making the transition from manually copying files to a Synology DS to an actual backup setup for me and my family Now for the start I have said DS available and a cloud storage, I have made myself familiar with the Restic basics and now I’m ready to start backing up in principle…

But now I’m kind of stuck at the “architectural” questions of the backup and other practical considerations, which I find missing in the usual blog posts (maybe I have to produce one on my own later). I’m just not sure if I’m overthinking this or if this is all obvious for others, so I would really like to hear other opinions on the following topics:

Passwords. Let’s say the house burns down and suddenly my devices are gone up in smoke, I realized that my backup on the cloud would be useless without credentials. Since I’m using random generated passwords for everything in my password safe(s) I would be completely lost without them. That is the current state. So how do you handle this? Passwords on dead wood? Distributed in several places? Other ideas?
Backing up mobiles (Android). If my conclusion is correct, there is currently no reliable Restic client available for Android and it is furthermore questionable if a mid range phone can handle a Restic backup at all (I’m aware of Termux but that feels kind of sketchy?). So my idea would be to one way sync (Syncthing?) all the important data to some place where I can use Restic. For the purpose in my family those images/videos are probably THE most important backup load. Having a sync process in before the backup feels like a significant weak point in the whole backup scheme.
Is there another way? Is anyone using redundant syncs? Manual reviews, e.g. by checking the latest image taken via logfile/mail? Am I just paranoid?
Backup Server. I’d like to adhere the 3-2-1 rule, but how is that usually done in practice? Do I create a single backup (e.g. from laptop to DS) and clone it from there on to other storage devices? Or do I create three independent backups, which might differ? From what I see the first option seems to be used quiet frequently, but that would mean that a failure in the “master” backup would wreak havoc in all the backups, no? Is there a consensus on how to do that? Use the restic copy option?
Checks. I see that there are a lot of options on how to check the backup databases, but I have not found something like a best practise guide. Are the databases supposed to be checked like every week? With --read_data? Are you checking each location or just one?
Self-update. Restic provides a self-update. I’m undecided whether it is a good idea to schedule this regularly. What is your strategy? Stay with one golden version or latest?
Third storage. So far I have only two storages (DS and Cloud) and I’m thinking about what #3 should look like. I’m kind of dissatisfied with Synology since the support phase already ran out for the device and in general I would prefer a open source solution the next time. This probably also depends on my answer to 3. if I need a main backup server, then I would tend to build a Linux NAS but otherwise a USB HDD attached to the DS would also do the trick?
Root backup. On Linux this is the way to go I assume?

Looking forward to your thoughts on this!

Nevatar · August 16, 2023, 12:00pm

Hi there,

these questions are relevant for every backup solution and mostly not so restic centered, so I as newbee to restic by myself but with a bit background in backup in general I share my thougths.
First of all, start with the backup as soon as possible, in emergency situation some backup is better than no backup at all.
To your questions:

Well, the is no general rule for password management, you mentioned password safe(s) you use, so for this you need a master password to memorize anyway and I would spread the database as wide as I can. Having multiple (synced) copies of the database can save you in many situations. You mentioned you know syncthing, I would recommend to use that for distributing the central password database between devices. Alternatively use the master password for the backup, since in there you should have everything, including the password database and as you said, a backup you don’t have acces to is useless.
Also my concern, I use the syncthing solution but I’ld love to have a native restic client
I use a nested approach, so my NAS at home is the central backup server for all my clients and I backup the whole NAS (including internal backup storage) to the cloud … I am not sure this is the best strategy, but easy to maintain. If you are really paranoid, this cloud backup can be mirrored somwehere else (rclone?).
To the best of my knowledge (correct me), checks are only necessary, when something went wrong. A test restore from time to time is a good idea always.
Well, I am new to restic too and have no experience in that, but I do not intend to use it. Usually this kind of software update is handled by the distribution maintainers (debian, arch, ubuntu …)
In my opinion with these two you are quite good. My opinion: I work with the original data (first storage), next comes local backup (second) and the cloud(third), so in principle you have three stages, right ? Correct me, if I’m wrong.
Yes. It’s a backup … it should have access to all the data. Creating a user with similar rights just create a potential second vulnerability IMHO.

Thats my thought, hope for a nice decision, because none of this written in stone.

bye, Nevatar

esd_event · August 16, 2023, 7:38pm

Appreciate the feedback, thanks!

Okay, so what you’re saying is, that you backup your stuff to the NAS and then you create backups from the NAS itself? That seems to be a reasonable approach. I always had the model in my mind that all backups should be identical so that they can be automatically compared etc. But that might be not necessary at all.
I searched the docs and found it here: It is actually recommended to run restic check after each backup that there are no errors in the metadata. Interestingly this is more or less the only point in the restic doc where this is mentioned. On the other hand I know found this thread, from which I would take that this is pretty useful. Essentially restic check --read-data appears to substitute/automatize the process of test restores and scrubbing.
Well, interesting. I had a look again and I think the 3-2-1 rule means that there are 3 actual backups in addition to the original. In my case that also makes sense because I can not guarantee that the original data is deleted by accident.

Nevatar · August 17, 2023, 6:04am

The only downside of this approach is, that in case of a NAS crash I first have to recover that to access the backups from the clients. Theoretically I can just mount the cloud backup and then use the repository there remotely (I did not test that yet). I solved this by using a second HDD as a mirror of the NAS HDD.
As I said, a test from time to time is not a bad idea either. Is this check really necessary every time? I would assume that a backup solution is safe when there are no errors in the process of the backup. Additionally checks are optional. If I cannot trust restic in saving my data when it just said it did so, I have to rethink my decision for this software. But I’m very confident, the guys here made a good job.
Well, to be honest, how often you need your backup? I can speak only for me, but I very seldomly need even the first stage of the backup and so far, I never have to access the second. if you really insist on this 3-2-1 rule I mentioned earlier that you can clone the cloud to just another cloud provider

esd_event · August 17, 2023, 7:30am

Well, actually if restic check (--read-data) works as I think that is a great feature in my opinion. After all, you can not check manually if the data in the backup is corrupted at some point. I think @fd0 summarized it nicely in this post. Though it would be good to have more official documentation on this I’d say

Nevatar · August 18, 2023, 11:40am

Very interesting coincidence, my NAS SSD died tonight, so I was forced to activate my personal emergency plan. As I said, I have a mirror of the SSD, so I activated this one (slightly outdated, but tolerable). Then I checked the backups, if I would be able to restore any missing data. In the meantime I was able to revive the SSD, so that I didn’t need to access the restic backup, but I could and this gives me a good feeling :-).
So in addition to my statement above, in my setup I made a second (or first) stage of backup by this mirror and then the restic backup to a cloud provider, so I also follow here the 3-2-1 rule. I extra look it up, in the original 3-2-1 the original data counts as one, you need 2 addtional backups, that makes 3 copies of the data, 2 of them on different storage providers and 1 of them offside. Nowadays the state of the art is a bit modified version (3-2-1-1-0 I believe), where in fact there have to be 3 copies of the original data. In my setup I realize this for the NAS clients but not for the NAS itself. And the zero at the end of the new rule is exactly your point, ensuring that the backup data is error free and I think the restic check is very helpful here.

fede · August 18, 2023, 5:26pm

Interesting ideas and contributions

1.Passwords.

I am using portable KeePass Pro ( free open source) https://keepass.info/ to store my passwords and I have the encrypted database synchronized with Dropbox and Google Drive for example.
Every fortnight I rename the databases and also copy them to a flash drive, so previous versions remain. You don’t actually enter enough passwords to require more backups.

In an emergency I can recover the database from Dropbox or Google Drive at least.

3-2-1 included the original copy.

@Nevatar It’s good that your backup system worked for you!

MichaelEischer · August 19, 2023, 5:03pm

regularly != after each backup. Restic is by now stable enough that data corruption in a backup repository often involves hardware problems like bitflips in memory or data corruption of stored files of a backup. To detect these rare problems, running check every now and then should be enough.

A printout of the password on paper that is stored in a safe-deposit box would also be a good option.

esd_event · August 20, 2023, 8:02am

You are right, I misread it at that point. I take from that, that is should be done regularly after the backup process.

Could you maybe elaborate that a bit further? Is restic check alone capable of detecting any data corruptions? And when/why would I want to use --read-data then?

noeck · August 20, 2023, 8:56am

I think he meant, restic is stable enough that it does not introduce errors or create broken backups. Bitflips and other hardware problems are rare but can never be excluded. restic check alone only checks the consistency of the repository’s snapshot/tree information (which is much faster, causes less traffic for remote storage and is also good to know). To really check for bit flips in the data packs, you need the --read-data option.

I assume, @MichaelEischer meant the phrase “running check every now and then” like “running restic check with appropriate options like --read-data or --read-data-subset every now and then” - with the emphasis on “every now and then” and not necessarily for each backup because errors are unlikely.

restician · August 20, 2023, 9:43am

Another consideration is to know what type of files you need to backup. And how a backup tool restores those file types. For example, Duplicacy does not restore hard links. And Restic does not restore sparse files. There is no data loss, but the size of restored data can change drastically.

Not considering this could result in a surprise when restoring files

fd0 · August 20, 2023, 9:59am

Oh, it does, if you tell it to:

By default, restic does not restore files as sparse. Use restore --sparse to enable the creation of sparse files if supported by the filesystem. Then restic will restore long runs of zero bytes as holes in the corresponding files. Reading from a hole returns the original zero bytes, but it does not consume disk space. Note that the exact location of the holes can differ from those in the original file, as their location is determined while restoring and is not stored explicitly.

restician · August 20, 2023, 10:36am

@fd0 I don’t think --sparse is that helpful.

With this option Restic applies sparseness to all files during the restore (even files that are not sparse files originally). And another (minor) issue is that a sparse file from the source gets restored to the destination with a different file size.

Obviously we could ignore the “file size” issue and use include/exclude to apply the --sparse option to just sparse files.

This exactly why I mentioned that we should consider sparse files.

(Currently I exclude sparse files and back them up with Duplicacy, even though I dislike Duplicacy )

esd_event · August 20, 2023, 11:53am

Interesting point, thanks. I did not consider this so far, but my application is rather mundane, so I guess Restic will have no issues with that, but I will have a second look.

Okay, so what I take from this is, that I will be running check with --read-data all the time and probably check a small subset after every backup.

restician · August 20, 2023, 12:17pm

Sparse files are typically used for virtual machines disk, container images, databases, log files, etc

On Linux I use this command to check if there are any sparse files which can not be restored “correctly” with Restic.

find . -type f -printf "%S\t%p\n" | gawk '$1 < 1.0 {print}'

fede · August 20, 2023, 12:18pm

Thanks, I’ll keep that in mind.

MichaelEischer · August 20, 2023, 12:49pm

That makes no sense. restic check --read-data is the most comprehensive check available, --read-data-subset is far less comprehensive. Did you mean check without --read-data?

Btw, check --read-data and check --read-data-subset always perform the metadata verification that’s run by a basic check run.

esd_event · August 20, 2023, 1:45pm

Well, depending on the choice of N --read-data and --read-data-subset are equal

What I meant was that it does not seem to make sense to run check without actually reading at least parts of the data, because that is what actually matters and what is much more likely to fail. Also since there seems to be no recommendation on the timescale I could just read the whole data set once a year instead of small chunks everytime.

Though I will probably opt for a partial read-check after every backup.

esd_event · August 21, 2023, 6:53am

I have spent some more thoughts on the question how to best set up the second backup. I will document them here because I have not found them elsewhere. In total I found 4 ways:

Loop

Run the same backup script with another endpoint location.

+ Simplest setup
o Repos diffable and even identical if they are not changed during the backup process
- Single machine has access to all backups

Clone

1st backup is cloned/copied to another endpoint location.

+ Simple setup
+ Identical = perfectly diffable
- Errors can propagate (risk can be mitigated with append only/immutable modes?)

Backup^2

Create a backup of the backup.

+ Fairly simple setup (depending on the details)
+ 2nd backup can be run from a different machine → some kind of decoupling
- Repos are not identical at all and diffing/restoring the second backup is harder

Copy Snapshots

Use restic copy to transfer snapshots.

+ Repos almost identical? At least diffable
- Probably the most complicated setup

I was heavily leaning towards the backup square method (which @Nevatar is also using?), because I like the idea of having the second backup being somewhat decoupled from the first. Also it feels kind of natural to use a backup process for that task.

But then I found out that I can probably not run Restic on my old NAS and I’m somewhat hesitant to replace it now. So now I’m not sure anymore if the backup square method is the best. That the backup chain looks like this: Device → NAS → Device → Cloud.

Now I’m also considering the clone approach. If I could manage to ensure that this can happen in an append only/immutable way, that should also ensure that there is a reasonable amount of decoupling, no? Is anyone here using this kind of setup?

doscott · August 21, 2023, 10:31am

Methods 2, 3 and 4 will allow you to make “backups” of a repository with unknown errors (most likely hardware induced and undetected). Method 1 would require both repository’s to contain unknown errors (less likely although still possible) before you’re in trouble.

Personally I do:
Device->QNAP NAS (restic, everything)
Device->Cloud (restic, all except large data sets I can live without)
Device->OMV/Thecus NAS (everything, using rsnapsot) [old NAS]