How can I increase the number of retries restic tries?

Hello,

I am trying to back up a large folder (~2 TiB) with restic and rclone as a backend. That works fine mostly, but at some point in the backup my internet connection seems to bee unstable. So I get transfer errors from rclone.

In most cases restic retries and is successful after 2 - 9 retries. But sometimes 10 retries are not enough and the snapshot creation fails. This is annoying as scanning all folders again takes a long time (20h), even with --no-scan to speed it up.

The command I use to back up is: "restic backup O:\folder\ -r rclone:remote:folder/folder --verbose --cache-dir “D:\tmp" -o rclone.connections=10 --read-concurrency=4 --no-scan -o retries=50”

I tried to increase the number of retries with “-o retries=50” but that does not seem to have worked as restic still only retries 10 time.

What would be the correct way to increase the number of retries?
Alternatively how could I increase the delay between retries, as many of them only have sub second delay at the moment which is far to little for my setup?

Thanks in advance!

AFAIK this is not possible in restic.

There is Add --retries flag by aawsome · Pull Request #2515 · restic/restic · GitHub, but this has not been merged so far.

For your use case, also https://github.com/restic/restic/pull/3230 could be interesting (also not yet merged).

If you need an option to handle your bad connection but are not fixed on restic, you could also have a look at A restic client written in rust which supports to set the number of retries (in the config file).

2 Likes

Thanks!

I would love to see some of your pull requests meged. Resuming or increasing retries would be exactly what I need rightnow…

Maybe I will give rustic a try. So far I did’t because it says it’s still experimental. If I run my backup with rustic and then check its Integrity with restic could I be sure that everything was correctly saved and will still be compatible with restic for future use?

Both projects use the same restic repository format. So in theory you should be able to switch back and forward anytime between them. I use rustic sometimes for my restic repos when I need some features not available in restic. So far all good.
How well it works in practice in your case…? Only testing will tell you.

Other option would be to merge mentioned PR yourself and create your own customised restic version.

If you are paranoid, you can run directly afterwards a restic backup --force (which should upload nothing than the snapshotfile to the repository) followed by a restic check --read-data (which checks if the saved contents match the hash IDs), but I don’t know if that check would abort if your connection is too unstable…

Over how much time are the retries for a file spread out? There’s also a limit of 15 minutes after which no further retries are performed, which is not addressed by the mentioned PR.

For which duration do these connection issues occur?

My Setup is restic → rclone → jottacloud

The retry issue seems to either be with my connection or with jottacloud. The problem is, that the first couple of retries are usually unsucessful with rclone returning error 409 (conflict) because the first upload it tried was not properly terminated and needs to finish timing out. At least that is how I understand it.

What follows is retries with progressively longer wait times. Usually the upload can complete after 50s but the highest I have seen so far is about 10 minutes (with rustic and increased retry count).

When I use parallel uploads (-o rclone.connections=10) the other uplads can usually continue without a problem, while the one that had a problem continues retrying.

Thanks for the reply, I’ll have to think a bit about how to increase the number of retries without causing other problems, in particular if a restic command fails and restic tries to unlock a repository, but is unable to contact the server. Although, it might be an option to add a --retries parameter or similar as an experimental CLI option. But I won’t have time to work on that this year.

Thanks to all of you for your help!

For the time being taking a snapshot with rustic and increased retries in a config.toml seems to work. But it is very slow because it only uses a single upload thread. I am speed limited per thread and I have not found a way to make rustic use multiple parallel rclone uploads similar to “restic -o rclone.connections=10”