Restic for long-term archival purposes?

Eli6 · August 17, 2021, 8:52pm

I have data that I want to archive in AWS Glacier. This is long term storage. Once stored, it’s not cheap and straightforward to periodically download and reencrypt data with a new software version.

On the other hand, software projects come and go. Restic has blossomed, and is now well widely recommended, but developers may get bored and leave the project.

What is the future development of restic? Is there a risk with using restic for long term storage?

I worry that when I try to recover data, I get an error!!

I know I can back up /bin/restic alongside data. Is it sure that this stand-alone file will always run on an x86 hardware to unlock data?

I don’t want rsync guys laugh at me!

rawtaz · August 17, 2021, 10:49pm

TL;DR: No, there’s nothing to worry about. You can happily use restic now and for a long time from now.

I might sound biased, but I assure you that the following is said completely objectively; Restic is probably one of the absolutely best options you can find if backwards compatibility and being able to restore is important to you.

The reason for this is that the most important part of restic is the repository format, which itself is actually considered the public API in restic.

The repository format is very clearly documented specifically for the purpose of not locking anyone into something that cannot be read in the future. This was a very conscious design decision by @fd0 when he started creating restic many years ago

There has already been a few examples of people fiddling around and building software that can read/write from/to a restic repository, which shows that what is describe above works in practice.

Hence, as long as you make sure to keep a copy of the design document/reference, you should be able to create new software that reads that data from your repository, in the very unlikely event that there’s no other software around to do it. Of course you also need the password for the repository (which if you were to not care about that security aspect, you could just note in a text file along with the design document in your Glacier, e.g. if you don’t care about that and would otherwise just have synced files using rsync).

Stop worrying so much

This is a question that obviously no-one can answer because it’s about the future. If you fast-forward 200 years, who knows if your binary can still be run. Probably, in some kind of emulator.

On a more pragmatic note, there’s no reason to think that restic binaries cannot be run for a long time from today, and if there were to ever comes a time when restic binaries will stop working, it will not happen over night - you will have several years if not decades to adapt to this fact and find a solution that works.

If you want some kind of similarity to compare with, consider the migration from x86 to x64 architecture - we can still run 32 bit software even though we’ve had 64 bit architectures for I don’t know how long now… Things like this don’t just vanish over night.

doscott · August 17, 2021, 11:26pm

I don’t think either the native restic back end or the rclone back end support using Glacier as a direct storage location. Some people use S3 with timed archive to Glacier, but that can prove to be a bit tricky should you need to recover.

I used to use CrossFTP to store/retrieve data that would only be retrieved as a last resort. For some odd reason there is more Windows software available for directly working with Glacier than linux.

Once stored, it’s not cheap and straightforward to periodically download and reencrypt data with a new software version.

I agree with this as far as cost and ease goes, but it makes me suspect that the Glacier data store would not be a backup if you have to retrieve it. I would just wipe the storage and use whatever my next software option was to restore from a local backup.

Eli6 · August 18, 2021, 9:31am

There is no doubt that restic has become software of choice for backups. I see a lot of experts recommend it.

Thanks to developers for creating this great software!

My time frame is probably a decade. I have two local copies. For these, I can test back ups, and create new repositories and reencrypt with new versions of restic if needed. In other words, local backups can be tested on a schedule; as soon as an issue is detected, problem can be fixed.

The tricky part is with long term cloud storage. These systems are not designed for frequent access. Here, I basically rely on the same restic binary that was used to create the repository to work 10 years from now. Downloading periodically TBs of data in several repositories is not economical (I keep limits on repository size to lower the risk).

Let’s hope all will be fine!

Eli6 · August 18, 2021, 9:37am

You are right. I back up to local NASs using restic, from there I will use tools AWS provides for deep archival to glacier. I yet have to set this up.

I don’t use AWS encryption, as restic already encrypts data —- and as far as I can tell, its encryption is very good (also reviewed positively by crypto people).

I agree. It might be cheaper to delete data and upload a new repository than frequently downloading, testing that it still works and uploading if there is an issue.

But, it takes efforts; if we know restic binary will work within a time frame of 10 years, we can just set and forget the archival back up. It’s a last resort solution, and for data not frequently meant to be downloaded.

torfason · August 18, 2021, 12:33pm

I think your best bet may actually not be to hope that a restic binary built today will work to restore your data, but that in ten years you will be able to relatively easily download and install restic (or a successor to restic) that can read the restic repository format.

Personally, if I had to bet, it would be that the timeframe for the availability of restic (or restic format compatible successors) is going to be longer than the timeframe for the availability of AWS Glacier. My two cents

tomwaldnz · August 20, 2021, 10:21am

Do you mean Glacier the S3 storage class, or Glacier the standalone service? I used to use Glacier the service, but I stopped and now use the deep archive S3 storage class.

If you want a long term archive I suggest S3 deep glacier storage class, without Restic. Just set up a bucket, turn on versioning, set up IAM / bucket policy, and upload your files using the correct storage class - I use the s3 CLI but there’s a variety of software that can do it. Because S3 does encryption and versioning, and Restic doesn’t do compression, the benefits of Restic are are limited in this case to deduplication. Restic restores from glacier classes would be fiddly as you have to bring them all to online storage rather than glacier, which is best though of as tape (though we don’t know how it’s actually stored).

I use Restic for my daily backups into S3 IA storage class.

Eli6 · August 20, 2021, 7:47pm

I meant S3 deep archive storage class.

I actually had looked up AWS CLI. The encryption is mostly server side with -sse flag, even with KMS. That’s useless (in any case, all cloud providers encrypt already on servers).

They have a client side encryption option as well with -cse flag. Have you used it for back up?

Anyways, the problem remains: will AWS encryption SDK be available as long as restic?

Restic is open source and better documted. AWS SDK mentions encryption, but details are opaque: how about authentication, how are keys generated and salted, etc.

tomwaldnz · August 20, 2021, 9:22pm

You can’t use deep archive with Restic if you want to restore directly. You can transition objects to deep archive after storage, but if you ever want to restore I think you would have to transition / copy the entire archive to S3 standard / IA class.

I know a fair bit about AWS, I have their architect professional and security specialty certification. Most of the details you are looking for are generally available. I won’t go into it all because it’s already documented, but not always particularly easy to understand - training is generally required to fully understand AWS. I’ve had a heap of training and I’ve been doing it for years.

I trust AWS to do server side encryption. For enterprise customers I use CMK (customer master keys) with a restrictive key policy. For myself I use the default S3 key. I don’t even see encryption at rest as necessary for personal data, that mostly protects against theft of the disks which is never going to happen.

You can of course use client side encryption, but then you have to do key management. I don’t know about you, but I’m far more likely to lose keys than AWS, who stores them in redundant HSMs across multiple data centers.

AWS generally does not deprecate products quickly, they’re enterprise focused and enterprises do not change quickly. Things that are barely used are kept running for many, many years, unlike Google who stop services on a whim. S3 and the encryption SDK will be around for a long, long time.

S3 encryption is not done with the Encryption SDK, it’s done by AWS behind the scenes using HSM (hardware security module) generated keys then AES256 encryption. They use envelope encryption, which is a bit of a head scratcher until you get your head around it. I’ve used the Encryption SDK for Java and Python within lambda. Encryption SDK is open source, I believe.

In summary:

Use CSE if you like, but S3 with KMS is secure enough for PCI and data classified for national security
S3 and Encryption SDK will be around for years to decades. Encryption SDK just uses standard KMS APIs and you can get the source code so it’s pretty safe.
Security by obscurity is not real security, but may help. If AWS has a zetabyte of data yours is probably lower value than others
Make sure you secure your S3 bucket properly!

Happy to answer AWS type questions

Eli6 · March 29, 2023, 10:29pm

I am curious, if I backup the restic static binary /bin/restic (the same version that was used to create the repository), and use an x86 computer in the future, why should I only hope (rather than be sure that I will be able to recover the data)?

Or perhaps , X86 is a generic term, and there can be many hardware variants (mixed with ARM, or evolution of Intel X86 to new hardware architectures), some of which may not run the same binary file?

Also I wonder if the same static binary will run on future operating systems? Or perhaps OS image should also be backed up?

rawtaz · March 30, 2023, 10:20am

If you are able to run restic (or equivalent), you will of course be able to restore repositories for which that version of restic is compatible. That’s the simple answer to everything you wrote, basically. Worst case, as long as you have the repository format specification, new software can be built to restore your data.

We cannot predict the future. Again, what I wrote above. I don’t see any reason to worry, considering there will be virtualization and emulation for however long we can imagine the future. You will hardly end up in a situation where you can’t run restic.

I hope this settles the question once and for all.

Eli6 · March 30, 2023, 4:15pm

Thanks a lot! Sorry I asked the question again. I read that a static binary still need to interface with the operating system, and use an appropriate “syscal”. Hence, I thought it’s necessary to backup OS image also (eg, Ubuntu 33.04 may have made substantial changes compared to 23.04). I will try to find the link.

rawtaz · March 30, 2023, 4:30pm

That is indeed true, but that is a natural part of it running successfully on the system. There is nothing additional about this - software running on an operating system uses system calls (“syscalls”) to interface with the system.

KamikazeePL · August 30, 2023, 7:53pm

I like restic because it works on Windows and Linux so it is system independent.