First, thank you for restic, it is an amazing piece of software.
My first question: is it possible to run pre- and post- repository access scripts within rest-server, with the argument of the path that rest-server wants to use? I could do something similar with the UNIX automounter, but I prefer to do it at the application level. If not, the work-around could be that the clients themselves call some sort of REST API before and after the restic backup.
My second question: what part of the existing files of the repository will a new backup read? Imagine I have 1 TB of previous backup snapshot data. I change a few files on the client. I start a new backup snapshot. What files will be read on the repository? None: only some cache files with hashes on the client? Only some cache files with hashes on the rest-server repository (where would those be located?) ? Only the list of files (file names are hashes of the content)? Or will sometimes the content of previous snapshot files be read, too ?
Context: in the 199x, the HSM world was popular (storing data on disk-drives and archiving on tape or WORM autochangers). Some data was immediately accessible (online), some data was accessible with a delay (minutes), some data was cold (days to access it).
I would like to see if the above concept could be applied to restic, and specifically rest-server, using a combination of hard drives arrays that can be switched on and off, and possibly LTO-8 autochangers, migrating away rarely used data on offline storage.
Obviously, from time to time, the data would need to be checked also on offline storage, but the fact that the file names are hashes of the content of the files makes it possible to verify it without restic (I already do that on the restic backups that I copy with rsync).
I’m not really an expert in the topics you’re talking about but since noone answered so far, I will say what I think.
…is it possible to run pre- and post- repository access scripts within rest-server, with the argument of the path that rest-server wants to use?
rest-server just accesses a standard folder that contains a normal restic repository. You can run restic on the machine that runs rest-server and do any maintenance work that way. For instance you can check and prune the repos directly on the server.
My second question
It doesn’t sound like restic is a good fit for that to me. The splitting into tiers is a complicated way to deal with scarse drive space. A complicated system with so many components introduces many points of failure that all have to work in order for your backup to work. I think this only makes sense if you really have gigantic amounts of data and even then there’s probably a reason why there are different approaches used these days.
Check out the restic documentation as it contains a bunch of details on how restic works under the hood. It’s well written and fairly easy to understand.
Thank you for the answer. It is really interesting to see what CERN is doing and, indeed, also in 199x they were experts in turning off-the-shelf hardware into high reliable and high performance hardware, for their unique special needs.
To solve my specific problem, I simply modified the client to call a web-service in my infrastructure before and after restic-backup in order to implement the initial need for mounting file systems.
When the need arise, I will check again if there is a way to implement HSM within restic (KISS), or if I need to go to a more complex multi-level HSM.
You may not need a full HSM if you are willing to keep thing stupidly simple. For example there are some very large disk drives in the range of 10 to 20 Terrabytes. An issue is that these drives are large but much slower than most drives. If you group your files into things that need quick access to backups and things that can have slower access to backups then run backups with separate repositories. example prices
Somewhat more complexity would be to buy more than one of the large drives and hook them up as a NAS. Slow but very large.
Yes, indeed. There are sometimes > 10 TB drives for less than 40 CHF (refurbished or no-name). However, nothing, still, seems to beat LTO-8 tapes. About 40 CHF here per 12 TB.
Obviously, access times are not the same. And a re-read every N day strategy is required.
This however exceeds largely my current needs. I only use LTO-8 for archival purposes currently.
Currently, I use off-the-shelf USB drives, at about 50 CHF/TB, and I verify all files regularly (it’s handy that the file content’s sha256 hash is the file name), since the goal of restic is to avoid duplication.
I also have off-sites regular copies.
I will however generate much higher restic backup volume and will require higher reliability when I migrate away from rsync & tar backups which are my current default strategy.