Some general questions (maybe good for the docs)

Yanestra · November 27, 2020, 3:28am

Hi there again,

thanks to your helpful comments I think I have understood that my original plans were not quite realitistic with restic. But restic seems good for other real-world uses on my systems.

But I have some more more questions about restic’s features. I think they’re somehow important questions and they should probably find a place in the documentation, please.

AFAIR restic does de-duplication. (Is that correct?)

Does it support deleted files?

(That means, will the restore process delete files that are not present in the most recent snapshot?)

Because I’m frequently re-organizing (read: moving) vast amounts of files and it would make me unhappy to find hundreds of copies after restoring a backup. On the other hand, I would not like them being transferred over and over again.

How does restic handle connection interruptions?

I usually have only a dialup connection with an enforced nightly disconnect that can’t exactly be predicted. What happens if the backup gets interrupted? What happens if the restore process gets interrupted? If a 20 hour transfer gets interrupted, do I have to start over?

I know that restic and its authors take much effort in supplying backward compatibility. Nevertheless I have (readable and working) backups that were made more than 25 years ago. With tar and gzip this is no problem, but with a software written in a programming language that is still in development (speaking of Go) and that relies on libraries or routines completely unknown to me (the Gentoo Linux .ebuild lists ~500 files of unknown role in the restic port) it might be a struggle to get that compiled in 25 years onwards.

Is there a usable and implementable documentation about the actual backup format?

(Couldn’t find that on the website.)

Or maybe a C implementation of a flat dumb unpacker?

I say thank you to all of you in this it appears really helpful community.

fd0 · November 28, 2020, 9:24am

Yep, it does. Chunks of data are stored in the repo only once.

The restore process is still pretty basic. If you point it to a directory, it’ll make sure that the files in a snapshot are restored to that directory, but it’ll not touch anything already in the directory. If you want to exactly replicate a snapshot, you’ll need to use an empty directory.

Both won’t happen with restic: it’ll store data only once. This means that when a file is moved or renamed, restic may need to read it again. If the data has been saved before, restic will only upload new metadata information about the file (name, contents, timestamps etc) but not the data itself.

Depending on the backend, restic will either abort (sftp) or retry several times (the other backends).

The next run of restic will discover most of the new data already uploaded to the repo. So it’ll continue roughly where it was interrupted. Please be aware that this does not consider files to be saved which have been modified between the aborted backup run and the restarted one. If you care about that, use a file system snapshot (btrfs, zfs, lvm snapshots, vss).

For restore of a snapshot stored in a remote repo I think you’ll need to restart the process from scratch. There’s no continuation logic in restore (yet). You could work around that by using restic copy <id> with a specific snapshot ID to transfer all data from a remote repo and then restore from that locally.

Valid reservations. Go is known for its backwards compatibility guarantee (here), but that does not apply to the libraries. Since restic compiles down to a single, statically linked binary, it’s easy to archive the restic binary from time to time and just have it lying around next to the backup, or an a rescue stick. I’m certain that in 25 years there will be a way to run this binary, maybe in a virtual machine.

You can also extract the restic source code, then run go mod vendor to copy the source code of all needed libraries to the vendor/ subdir, then archive that and compile it later with a Go compiler.

In my opinion, the Go ecosystem is on a pretty good track to achieve stability and compatibility in the future.

I’m not sure if it fits your definition of “usable” and “implementable”, but we have a pretty good documentation of the repository format in the docs, the file is also contained in the source code of each release. I first wrote the documentation about the repo format, then started implementing it. There’s even a toy Python implementation which is able to access data from a repository without restic: GitHub - oysols/restic-python: restic repository pack file decryptor in Python