Questions about how Restic works

Systemgeek-louis · April 22, 2020, 12:29pm

I read the documentation but I still have questions. As I understand it Restic creates point in time snapshots of the data. It does this by creating an index in a different location (not sure if that is controllable or not). Using that index Restic determines whats changed since the last backup.

I am asking these questions because I recently came to a company that is using Restic but due to space issues parts of the data has been deleted. Specifically, the indexes as the actual snapshots are being uploaded to AWS S3.

So my questions are:

What happens if you have lots of snapshot already but the index to the snapshots has been deleted. Can Restic look at the list of snapshots and re-create the indexes?
With the case above where the indexes have been deleted how would you go about restoring the data? Would you just need to restore all the snapshots from the oldest to the newest in sequential order?
Assuming you wanted to restore data and the indexes have been deleted and some of the snapshots have been pruned from the system how would you go about restoring? For all you know the data you want to restore could have been on the purged snapshots.

cdhowie · April 22, 2020, 1:24pm

The index doesn’t care about snapshots, it cares about the data packs. restic rebuild-index will pull the header for each data pack and recreate the index files.

You would rebuild the indexes before performing any restores.

Well, of course the snapshot you want to restore from has to exist. If the snapshot has been forgotten and prune was run then the data unique to that snapshot is gone.

Overall, it’s not clear what you’re trying to accomplish. The questions you’re asking make it seem like someone is trying to use restic in a way that it was not designed to be used, and you’re likely going to run into trouble sooner or later.

In particular:

I would need to know what parts of the repository are being deleted.

There is no point uploading indexes anywhere as they do not need to be backed up; they can be regenerated on-demand. However, if the indexes are gone, the deduplication mechanism in restic will no longer function with respect to any of the data already in the repository.

It is very likely that by deleting the index files from the repository, they are making the storage issue worse as every new backup cannot deduplicate any data against prior snapshots.

Systemgeek-louis · April 22, 2020, 1:46pm

I am still trying to get a handle on what this company is doing. So my questions are based on what I am seeing only.

They way they are using Restic is via a docker. They have jobs in Jenkins which then spawns off a bunch of docker containers. Each container has its own copy of Restic and the container backs up a single machine or single home dir then the container. The data is uploaded to S3 and then the container is shutdown and removed. The next day the jobs are kicked off again via Jenkins.

The reason for the container system is because the previous Admin was madly in love with containers. Anything he could shove into a container he did regardless if it was a good idea or not.

Whats really scaring me is Restic has been backing up about 30-40 TB of data for the last 2 or 3 years now to S3. I have not looked yet to see if there is some organization like by machine but even if there is and I had to re-create the indexes and restore some or all of the data I am not sure who long it would take. I am thinking if I had to restore I might have to ask AWS to put all that data from S3 on to a Snowmobile and send it to us. Then try to restore from that.

cdhowie · April 22, 2020, 3:20pm

Invocation of restic in a container is not necessarily a bad thing. I’m more concerned with what repository data has been deleted.

Systemgeek-louis · April 22, 2020, 4:42pm

I too am concerned about that but I am also concerned that 90% of the time the backup fails and from what we can tell its not Restic thats failing but things like Jenkins and Docker that are failing.

cdhowie · April 22, 2020, 7:49pm

I assume you’ve looked through the failed jobs’ output to try to determine why it failed?

Given the size of the repository, it’s possible and likely that restic exhausted the memory available to it. The output would help verify this.

MichaelEischer · April 23, 2020, 7:57pm

That sounds slightly off to me (or maybe you just use the term index different than I’d expect). restic splits files into data blobs and stores directories as tree blobs. These blobs are packed into pack files which are stored in the data folder of a repository. A snapshot just refers to one of the directory trees (= tree blob). The index (in the index folder) serves as a quick look up where a specific blob can be found. It can be regenerated using rebuild-index but restoring any data cannot work without it (as restic does not know where to find it) and deduplication during backups also does not work. You also won’t be able to check the repository integrity or prune the data of old snapshots without the index.

Restic also creates a cache on the local filesystem for performance reasons (unless you pass the --no-cache flag). That cache can be deleted at any time, it would just be downloaded again on the next restic run. It’s actually a good idea to keep the cache between restic runs as that avoids downloading the index / tree blobs over and over again.

Restic currently gets rather slow when the repository is more than a few TB large, so I hope that each machine uses its own backup repository.