I’m thinking of using restic to back up some servers which deal with fairly large datasets.
The total amount to be backed up would be about half a petabyte, with us adding a few hundred gigabytes more each day.
The underlying storage would be a Oracle ZS4-4 appliance, serving via SFTP. Alternatively, if sftp would be a major bottleneck I could run the REST server if that would be faster — but only if I can somehow get a Solaris 11 restic binary. Or I could mount over NFSv4 but I’d greatly prefer not to.
Anyone have any thoughts on this? What kind of performance should I expect at these volumes?
I think restic can generally handle repositories of this size but they require a lot of memory and commands like check and prune can be really slow (as can be seen here).
While most of my repositories are using SFTP as backend and it’s working pretty well, it’s probably the worst backend performance wise. With such huge data sets you should try out other protocols which are more efficient. Maybe restic & rclone & Openstack Swift?
The main issue for me is that unless I do some serious surgery, the machine with very fast access to the storage appliance (which itself doesn’t actually let you get a shell - it just serves files via sftp, nfs, ftp, and a few other ways) is a Solaris 11 box. I could probably get a minio binary on there for S3 protocol but anything complex with lots of requirements is real hard. (I tried getting Borg on there and it was just a nonstarter - the latest Python I was able to successfully get for instance was 3.3.)
I got rest-server crosscompiled for Solaris today and started loading data, and indeed the ingestion is extremely fast. I’ll report back when I’ve got the whole initial backup loaded (which may take a week or so).
It’s just for the progress bar and runtime estimation, it’s not strictly necessary. I’d vote against having an option for disabling it, in my experience scanning does not cause the backup operation to take longer. On the contrary: sometimes this causes the directory structures to be in the memory cache, so backup is sometimes even faster when scanning was done before…
You can disable it by commenting out these lines:
I highly doubt that this will have any effect.
In my experience, memory usage goes up with the number of blobs (and therefore the number of files) in the repo: a few large files are not a problem at all, but lots of small files (like on a mailserver) will cause memory issues.
We have a significantly smaller dataset (just a few TB), and…it’s bad. I’m a huge fan of restic, but it’s just not suited for this kind of thing. Prune, for example, is an essential function, but it requires a just insane amount of memory if you have a lot of files like we do. We haven’t been able to do a prune since we started using Restic. So far we just let the backup grow and grow (at a small expense). There’s hope that this will one day get fixed, so we’re OK waiting until then.
I think there’s little harm in trying restic, but I’d be almost shocked if you ended up using it. It makes me sad to say that so bluntly, but I think this just isn’t a use case that works yet.
I actually don’t know what you’ll be able to use to back up this much data on a regular basis. We haven’t found anything good — just a scan of our data takes too long, really — so I think we’re moving to ZFS so that we can have snapshots and filesystem-level backups. We’ll see if that helps.
We ended up going with Bacula, which so far is seeming to handle our scale (it ended up being about 600 TB so far total). The interface is awful and it’s way more complex than it needs to be, but it doesn’t seem to bog down at all at these sizes, whereas restic basically stopped working entirely for us over about 100TB.
I’d love to see restic work at petabyte or near-petabyte scale, as it’s so much simpler conceptually.