CERN is testing restic for their backups

Oh man now I have geek drool all over my hoodie :crazy_face: @fd0 I hope this makes you really proud :sunglasses: Thanks again @robvalca!

1 Like

How did you do that? Did you just teach everyone how to use restic’s cli?

No, we don’t expose restic commands. The idea is that the users will request restore jobs (initially from their CERNBox web interface) which essentially is just adding a restore job in one database. Then, n number of restore agents (restic wrappers) will query this database asynchronously and restore the data directly on their home folder. This is for the “dropbox-like” users, for the pro users our idea is to use restic fuse mount or just create this restore jobs using the command line (but in any case through some kind of wrapper). Exposing directly restic commands would be a bit dangerous as it will give users access to forget, prunning, etc… and we don’t want them to mesh up their backups…

@robvalca How do you deal with credentials? Your clients need some credentials for the repo, your prune workers need some credentials, and the restore workers do too. Did you add multiple keys to each repository, or just share one and the same on all three nodes? I presume however you have different credentials for each repo at least?

1 Like

Okay… what kind of control do your users get about what is restored? I would imagine that at the very least one should be able to state what file/folder to restore, from what date and probably a target location.

Yep, this is what we use in the prototype.

Yes, the user input will be parsed to the restore’s --include flag, so a pattern can be used for restoring files/dirs. Also the date of each snapshot will be presented (we store snapshot info after each backup with restic snapshots --json). The target folder is not configurable, on purpose, and a specific folder with the snapshot timestamp on it will be created under the user’s home (__backup__restore_xXXXX, etc) to avoid overwriting stuff.

1 Like

How thoroughly have you tested restic restore operations on your system? We have experienced some serious problems while restoring repositories with large files and deep directory structures.

After months of effort, we have still not successfully restored a 40tb repository. In particular, large files located more than 4-5 levels deep in the directory tend to never completely restore using the restic-restore functionality. The files partially restore, but are left in an incomplete/corrupted state as restore progress (measured by watching data transfer speeds) slows to a crawl.

We have resorted to mounting the repository with fuze and restoring files by manually copying directories over to the restore target.

Have you guys experienced similar issues? What is your restore strategy?

1 Like

This is a known issue, please try PR 2195, which is supposed to solve the problem.

3 Likes

Yes, we are using that PR for restore agents as is not yet merged.

1 Like

For anyone interested, or just FYI, this PR #2195 has been merged now, so the next release of restic will contain it (and I expect also a bunch of other fixes/improvements). Meanwhile you can use one of the automatic master builds if you need it right away.

3 Likes

After some more searching, I found an update to the original video. It was posted on 2020-02-03:
https://cds.cern.ch/record/2708920

Here is the event link where you can also find the slides in a PDF:
https://indico.cern.ch/event/862873/contributions/3724442/

@robvalca has been busy!

1 Like

Thanks for posting that! I just love CERN :slight_smile:

1 Like