This topic may be related to https://forum.restic.net/t/when-why-do-files-become-new/6066
But I’m not altogether certain of that.
We have a remote customer where we are trying to manage the job of backing up of about 330 TB worth of data. It’s an old system running a modified version of old FreeBSD 10. Old 8 year old CPU, limited 24GB RAM etc. No avx2 or avx3.
I think we started using restic 0.14.0 (repo ver 2). The backup process is slow and has taken several months already to get to this point where we are almost finished. We broke up the task into manageable sized snapshots. We updated to restic 0.15.0 in January and now 0.15.1.
Back in November 2022, we ran this:
./restic -r rest:http://192.168.150.100:8000 backup "/mnt/tank0"
11 days later it completed. By default restic compression would have been auto.
Now we are running a subsequent backup/snapshot like this:
./restic -r rest:http://192.168.150.100:8000 --compression off backup "/mnt/tank0"
The job is still running for more than 24 hours and would be done in 4 days according to ETA. Note the -
-compression off switch. It’s using the correct parent snapshot by the console output.
As far as I can tell, this restic job is scanning all files on source even though they were previously stored on the rest server target. We think it should be done in hours and not days.
I’ve read suggestions from @fd0 that indicates on future attempts we can try
--ignore-ctime options. But I’m not certain that would be useful for us…since all files appear to be affected and it’s not likely that the customer has modified all those files on source location. So could the switch from
--compression auto to
--compression off be having a negative effect where scanning all files are triggered?
I changed compression modes a lot of times on existing repos and never experienced something like this.
So I don’t think it has something to do with different compression mode. There must be other reasons why restic is scanning that much.
The archiver component of restic, which makes the decision which files to back up which to skip, does not even know whether a repository is compressed or not. So that cannot be the cause of the slow backup.
From what I can tell, the server does not have enough RAM to reasonably handle such a large dataset (unless a large amount of that data is duplicated). Restic is also not designed to handle near petabyte repositories. I strongly suggest to split the data across several repositories (max. 100TB preferably even smaller). As a very rough estimate you need 1GB RAM for every 5TB of data stored in the repository plus 1GB RAM for every 5 million unique files in the backed up data.
@MichaelEischer Thank you for your suggestions. We will certainly keep these restic RAM estimates in mind for the follow-on projects.
Just for a better historical perspective, we did update the customer’s 8 year old system back in January so that for the last series of snapshots we were using 48GB RAM (instead of the original 24GB) and a SSD swap disk. Restic did appear to finish the last of these snapshots. Of course future servers that we intent to backup will have more RAM.
We didn’t have an opportunity to try subsequent snapshots to see how they differ from the parent snapshots and then look for “restic diff” differences either in metadata or content. When we get around to doing that then I suppose that would affect our decision to use
--ignore-ctime options. I guessing that if we use those options then restic will only look for mtime changes (doesn’t matter if source file is older or newer than what’s stored in the parent snapshot?) on the source side before reading the entire source side file?
Please have a look at file change detection .
I’m just updating this post with more recent information.
Finally got an opportunity to run the “restic diff”. There were 958804 new, 968149 removed files. The customer told us that there would be few differences. That’s clearly not the case.
So, we agree that the change in compression mode was not responsible for the long backup. It’s the processing of the large number of new files which greatly contributed to the performance penalty.
Thanks for the help, everyone.