I am new to restic and love it so far! I have a question around the internal format of restic data.
How can we guarantee (restore) access to restic repos far into the future?
I have reviewed the reference guide. I have reviewed issues such 628.
I am still unclear and hence the question.
For a bit more context to help clarify the question, I have been backing up computer data for 30+ years. A common problem I have faced is around internal format representations of the backup archive can become obsolete and therefore, rendered useless. Also the media it is stored on, but I think I have a process for that.
I regularly exercise restore capability for the most critical data to ensure its availability. I believe the ‘check’ function and doing restores provides confidence for this type of backup.
But I also have a lot of long term, ‘cold storage’ type of backups (i.e. a document from 20+ years ago). Generally, I don’t have the time to exercise these restores, and therefore don’t.
I am concerned about the ongoing ‘maintenance’ of these types of repos because I may have to migrate them as the restic internal format changes with newer versions.
Are there any common practices folks have done to address this type of scenario? For instance, just store a copy of the restic binary with the backup repos. Given restic is still young, would you not recommend it for long term backup storage?
I was a crashplan user. Another case where the backup format caused a lot of pain because I had to restore all the backups to native file system formats to then add back into a restic repo. Many hours of migration work.
Hopefully, I am clear expressing my concern. That said, I am eager to become a regular user of restic. I am working on setting up backups across my machines.
It’s a good thing to consider.
In my opinion it’s rather simple; If you store a copy of your repo, a copy of the design document, a copy of the documentation, and a copy of the source code, then you have every piece you’d need to make sure you can open/read/restore your backup in the future.
PS: You will not end up with a problem due to having used commercial non-open source software to create your backups. Since restic is open source, you are always free to know how your data is stored.
The easiest would be to save a bit more data alongside the “cold storage”
- A compiled
- The source code for restic (also contains the design document)
- The current Go compiler
With that you’re able to build a compatible binary of
restic for most operating systems, at least for the common ones. The most important attribute for restic, at least in my opinion, is the repository format and its specification. With that (and a bit of time) it should be possible to reimplement what’s needed to recover data.
Besides that, restic may not be the best tool for the job, as it is made for backup, not for archival. These two jobs are related but different.
For example, restic reduces duplication within the data, that’s part fo the design. But for archival, it’s probably a good idea to even add more redundancy via error correcting codes (ECC). That’s not something restic will have near-term.
For restic, we’re planning a to change the repository layout once in a while, e.g. to add compression. Probably there will be a conversion process, the concrete implementation is not yet decided. We will still support old repository versions for a while, but eventually we’ll drop support for old repository versions. That’s another reason why using restic as an archival system may not be such a good idea.
On the upside, for opening a repository you’ll only ever need a binary of restic. That’s statically linked anyway, and does not have any dependencies. So when you save a binary, the source and maybe the compiler alongside the repository, you’re probably fine for a long time.
Great points and I appreciate the quick, detailed answer.
I had not considered the source code nor the go compiler to be stored with the archive. That is sound advice.
Presumably, with that and the repository I should be able restore the data for a long time to come.
As for redundancy, the plan is to have 3 copies/phases of backup. First, is to use a remote NAS drive on the network for the restic repo. Second, is rsync the restic repo(s) to an external drive which will remain in a fireproof vault when not in use. Finally, a third copy will be sent to a cloud provider (e.g. B2, S3). While not 100%, I believe this should be reliable enough for my backup and archival needs. It might even be overkill.
I welcome anyone’s input on this strategy.
If I have the restic binary, source, and go library collocated across 3 instances of the data, I believe this should be reliable for years to come.
P.S. I read the note about personal burnout. It is risk, no doubt. Hopefully, others in the community will read and respond so fd0 (note no @ sign as I don’t want to ping you) does not feel the need.
Ah, don’t worry about it
I thought it’d be a good idea to state somewhere that while it hasn’t happened so far, I can see the danger on the horizon. So I’m consciously taking time to not look at any issues and rather code away. I can also see that people may be frustrated if nobody looks at their issue or PR for two weeks.