"not enough cache capacity" when restoring from MinIO (S3)

Hi There,

We are evaluating restic for our daily backups to an onsite MinIO cluster. The backup process is really smooth and is working quite well but I have hit some issues when restoring data from MinIO.

I am restoring “eventstore” data from an S3 bucket which consists of

  • 232 “chunk” files all around 257 MB in size
  • A couple “chk” files which are tiny
  • index files which are between 23 and 367 MB

The restore always fails with a few of these errors:
ignoring error for /data/backup/eventstore-volume/very_long_sha/_data/chunk-000000.000000: not enough cache capacity: requested 9976743, available 6836375

The files that have errors are incomplete and the data is unusable. I am not sure where to start looking for a solution to this issue. I tried running with --no-cache set but this does not help.

Environment

  • restic docker container with a mounted volume to restore in to
  • Minio cluster (4 x nodes) behind an nginx reverse proxy / load balancer

Restore command:

docker run --rm --name restic-restore-eventstore \
    -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
    -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
    -e RESTIC_REPOSITORY=${RESTIC_REPOSITORY} \
    -e RESTIC_PASSWORD=${RESTIC_PASSWORD} \
    -v $PWD/eventstore-data:/eventstore-data \
    restic/restic --no-cache \
    restore $1 \
    --target /eventstore-data

EDIT 1: The issue does not seem to be docker related. I installed the restic client locally and I still experience the same errors.

EDIT 2: I’ve ruled out nginx as well by going direct to one of the minio hosts. I still hit the issue using this command:

    restic \
    --no-cache --verbose --verbose \
    restore latest \
    --target ./eventstore-data

Any help would be greatly appreciated

I started digging into the code to try to figure out why this could be happening. @fd0 I think there’s a problem with how the pack cache size is determined:

Using an average is all well and good if access patterns can guarantee that the average size of packs in use is less than or equal to 5MB. In one of my repositories, I have 68 packs that are larger than 10MB. If too many of those are needed at once, this error could arise. We should never run into a situation where a restore fails because we underestimated how much data we would need in the pack cache.

I think it would make more sense for the pack cache to use either the maximum theoretically possible pack size, or look for the largest pack in the repository and use that value.

But perhaps it would make the most sense to just remove the upper limit – why is there one in the first place? If the system doesn’t have enough RAM to process the restore that’s one thing, but why are we artificially limiting how many packs can be in the cache, when there’s already a limited number of worker threads?

Thanks for the detailed response!

I built restic from source with the averagePackSize set to 10MB and it does seem to resolve my issue. I agree that it would make sense to check the repo for the largest pack size and go from there. Alternatively there should be a retry built into the filerestorer to handle the situation where reserving space fails momentarily.

This is a bug, restic must be able to restore such files. Can you please report this as a bug on GitHub, so we can track it? I’ll then assign it to @ifedorenko who recently rewrote the restore code :slight_smile:

1 Like

@fd0 - I have created an issue on github:

This was already addressed in #2195, which is waiting for somebody to review and approve it :wink:

2 Likes

I’ve just pulled and built your source code and I can confirm that it does resolve the issue.

2 Likes

FWIW, I’m getting these errors too, when currently restoring from a REST server.

Bumping this issue as well – using 0.9.5 to restore from a B2 bucket aborts the xfer, and some restored files are inaccessible. :frowning:

Is there a workaround?

Local build of https://github.com/ifedorenko/restic/tree/out-of-order-restore-no-progress branch is the only way to get past this error at the moment.

This is the kind of problem I was worried about here:

The fix for the problem was posted on Mar 3, that is seven months ago, and the code for restic remains frozen. Very sad. This is such a good project.

Thanks, I can understand your sadness, and I even share it! As you probably know, restic is run by volunteers, and at least for me it’s a spare-time project I do for fun in the evening, when I find time. As I just moved into a house, time was scarce. I hope to find some more time for restic soon :slight_smile:

3 Likes

Hello userr1,

I understand your concern, but whether intended or not, your post seems hostile against a team of people who have given you an amazing open source project, giving time and money, while asking for nothing in return. I know it’s hard to see such a good project in a dormant phase, but as fd0 discussed in your thread and here, he’s going through some pretty major life events right now. He never said he was done with restic, but if people treat him like he’s a bad person for putting family life first, I would consider that a reason for him to pull back more, not put forth further effort. Restic is an amazing program, it ticks all the boxes for so many users. It provides me significant peace of mind, as I’m sure it does for many others. Please, let’s praise fd0 and the restic contributors for their work, not alienate them for putting their families and real lives first. It’s pretty amazing to see what was already done. I’ve seen open source projects die, several of them ones I held deep interest in, but it was not due to malevolence, and in each case it was an abrupt dismissal, without further work or contact. Fd0 is here, he’s contributing on the forums and coding when he has the chance. This is not a dead project, it’s a project run by real people with real lives that take priority. I’ve been using restic for years, and I have no fear of it stopping now. The biggest threat to open source projects is a hostile community, not a commitment to families and friends. If restic is something fd0 and the community pour their hearts into, I would think its greatest threat is the developers losing their interest, of which the most common cause is a community acting against them. I can’t speak for fd0, but I would imagine his family has provided him great strength in life and even in working on this project. If his family needs him right now, I feel the best we can do is support them by supporting him. Anything else would be selfish and possibly destructive to the community that has been built here.

So, to take it one further, I want to say thank you to @fd0 @matt @ifedorenko @cdhowie @rawtaz and the many restic contributors (I’m sorry I haven’t cited you all by name, but please know I and many others here are thankful!) of time through code, docs and forum support. You’ve made this project from nothing, you’ve created something beautiful, and you’ve asked for nothing in return. I am humbled by your strength and care for the community and the work you’ve put in so far. Please, if you need time now, feel free to take it. Please know that I and many others here silently support you. I know that if this community gives back even an iota of the effort you’ve put into it, you’ll never have a desire to leave them.

Thanks again all! And good luck @fd0! I wish you and your family many happy, healthy years enjoying your new home!
jedi453

4 Likes

Thanks for your post, hope to see restic moving forward.

2 Likes

@fd0,

By the way, what is the story on how restic got started?

I’ve talked about that on the Go Time podcast, the short version is that I needed a good and usable backup program and found none. So I thought about it for about two years or so and started writing one. and now we’re here. :wink:

2 Likes

Thanks! I didn’t know about the podcast!

How do you determine the max. pack sizes in a repository?

Okay, so I used a simple find on my data and found out, that I do even have 404 packs, which are greater than 10MB - the biggest one being 20MB. I guess it would be good to then build restic from source and apply the patch for this?

Heya everyone,

Pardon the thread necromancy. It seems that i’m running into this issue using restic 0.9.6 compiled with go1.12.12 on linux/amd64.

The previously available link for a fix is no longer available: https://github.com/ifedorenko/restic/tree/out-of-order-restore-no-progress

Any ideas would be most appreciated.

Best,

Gene