Anyone use restic with large data sets?


I’m considering using restic to back up ~500TB of scientific data. I’ve had trouble finding anyone with experience using it at that scale. Any stories? Scrabble Word Finder

(The underlying storage would be an Oracle ZS4-4 appliance. Yes, I know about zfs send. The client machines don’t have zfs, so that’s not an option.) Solitaire



Some threads I found on the forum about large datasets. Please keep in mind that restic’s performance depends on a few factors like hardware, backend and dataset (number of files and snapshots) etc.



Note that “amount of data” is less important to restic performance than “number of files.” We’ve seen cases where a repository much smaller demands many GBs of RAM to operate on due to the size of the indexes, because the backups contain hundreds of thousands of tiny files.

If your files are each at least a few MBs then you will have a much easier time using restic.



I’m currently testing restic 0.9.4 with 11TB, ~5M files data set. Backend is sftp. The backup part works fine. After original hassle of putting 11TB to the remote end subsequent sweeps only take about 1h. I noticed, though, it might consume upto 5-6GB of RAM but I can afford it on 128GB RAM server.

The restore part looked problematic, but it seems that there will be the solution soon (Restic 0.9.4 is still slow on restore (sftp backend)).

The real pain is the ‘prune’ function. It took 6 days to complete it. Since I have ssh access to the remote storage server I’m trying to run ‘prune’ locally on it now in the hope that it will work faster with the direct access to the repository. It has been running for 28h so far and has not finished yet.

1 Like