In what order does Restic restore files? Appears random

StrikeLines · November 22, 2019, 12:17pm

We are restoring a GIS data processing server that had over 50 TB of raster files stored on it. File sizes on the server range from 5 MB to 500 MB. Our directory structure groups image tiles from each map product into its own folder.

How does Restic decide which files to restore first during a large restore? In our case, it appears to be restoring files in a random order from all over directory structure. Because our data requires all the files in a folder be restored before we can load them, this means that none of our data will be accessible until the entire repository has completed.

This random restore order also makes it impossible to monitor the progress of the restore or resume failed restores.

Is there a way to force restic to restore files in a certain order? Top-to-bottom of the directory tree? Sorted alphabetically or by creation date?

rawtaz · November 22, 2019, 1:56pm

Not sure about the order, but if I were to make a guess it’d probably have something to do with the alphabetic order of the hashes? No idea.

There’s no such feature in restic that you ask for, AFAIK. But you can of course do multiple restores, each for a subset of the files. You can use the exclude and include arguments to the command, see https://restic.readthedocs.io/en/latest/050_restore.html#restoring-from-a-snapshot for a quick summary.

Either by automating it using some script that uses the ls command, or just manually. Another option is of course to use the mount feature whereby you manually pick which files and folders you restore.

David · November 22, 2019, 5:05pm

This random restore order also makes it impossible to monitor the progress of the restore

Well, not exactly. You can monitor progress by comparing the bytes restored to the total bytes of data to be restored.

On Linux:

du -sch /restoredirectory will give you the bytes restored
watch du -sch /restoredirectory will monitor that number every two seconds
watch 'TOTALSIZE=100;echo "scale=2;`du -sm /restoredirectory | cut -f 1` / $TOTALSIZE" | bc' will report the completion percentage every two seconds (replace 100 with the number of megabytes in your restore)

StrikeLines · November 23, 2019, 1:00am

Thanks for the suggestions. We’ve currently got several concurrent restores running on different subsets. This is how we’ve gotten back some business critical data that couldn’t wait.

It’s funny. Even within a single folder, where all of the files are named in sequential order, the files are restored randomly. Check this folder we’re desperate to have fully restored:

random%20order

Missing files. Incomplete files sitting stalled, then occasionally collecting a few more bits. We’re almost two weeks into this restore…

That’s the thing. Our primary raid array crashed, and we don’t know how large any particular folder was. I considered running restic stats on the repository to gather this information, but we’re sitting at 80 percent memory usage right now (~32GB of RAM installed), and I’m afraid running anything else might crash the restore instances that are running.

I tried , but we haven’t been able to get mount to work properly on such a large repository. The first time I tried to mount the repository, memory usage shot up very quickly, and I had to kill the mount process for fear it would crash the primary restore process that’s been running for two weeks.

This has been a frustrating process. People keep asking when the data server will be back up and running, and I currently have no idea if it will be a 1 day or 6 months before they can get back to work.

Thanks again for the suggestions. I’m sure we’ll figure something out.

ifedorenko · November 29, 2019, 1:13pm

Just want to confirm that restic 0.9.x does indeed restore files in random order. Earlier restic versions used top-down restore impl iirc, but that was single-threaded, performed really poorly and I could not find a way to implement multi-threaded restore that guaranteed directory restore order.

There is no restore progress either, unfortunately. I had proof-of-concept implementation at one point, but that got obsolete by some other changes in restic and I never re-implemented it.

TRPB · November 30, 2019, 3:16pm

would it be possible to at least show “101/1234 files restored”. Obviously this won’t give a time because larger files take longer to restore but it would at least be indicative of how much progress has been made

ifedorenko · December 1, 2019, 2:39am

This is certainly doable, but personally I am not in the position to implement this change at the moment.

cdhowie · December 1, 2019, 6:54pm

Note that this would at a minimum require crawling all of the tree objects in a snapshot tree, either in advance or parallel to the restore operation.

TRPB · December 1, 2019, 7:14pm

Could restic just keep a running count of files as they are added to the repository? It would add very little processing during the backup process and require a tiny amount of additional space.

cdhowie · December 2, 2019, 1:45am

In fact, I would like to see restic store this information as well as the “restore size” of each snapshot (the raw size of all files before deduplication) in the snapshot’s metadata. @fd0 is this a patch you’d be willing to accept, since it can be done in a backwards-compatible way?

ifedorenko · December 2, 2019, 4:52pm

restore already traverses snapshot tree in order to determine what data files it needs to download.

StrikeLines · December 4, 2019, 5:39pm

I’d like to second the request for a XXX of XXXXX files restored dialogue running during the restore process.

I’ve currently got multiple restores that were running quickly initially, but have now slowed down and appeared to have stalled. I’m still seeing 200-300 Gbps of network traffic from the backup server to the restore target. In fact, the backup server has sent over 80 TB of data to the target so far. Despite that, only 9.78 TB of files are currently appearing in the target directory, and we’ve been sitting at that number for about 24 hours now. (This is approx a 35TB expected restore size).

I’m starting to suspect that restic is just redownloading data over and over and nothing new is getting saved. Without some way of monitoring restore progress from within restic, I really have no easy way of figuring out what’s going on right now…