Backup clients and sleep / interruptions

So here’s a question… what happens if a client is backing up, but then isn’t paying any attention to it and closes the lid of their laptop and goes home?

I’ve got a stale lock now. I can call restic unlock which will probably remove it - but what happens if the client wakes up and reconnects? What happens if I run a prune, THEN the client reconnects? Or connects DURING the prune? Would this affect only his backup, or could it corrupt the repo?

1 Like

The backup will be interrupted, but the data that was already uploaded to the repository will not have to be uploaded again on the next run.

Depending on what type of backend you use and how you set up your restic instance, the backup session on the client might continue or retry once the client wakes up again, or it might fail when it detects that it no longer has an active connection to the repo.

Stale locks aren’t uncommon. Please see the documentation for locks for some background on how locks work in restic.

If the client just ran a backup, it will have created a non-exclusive lock, which will not prevent it or other clients to back up to the repository. Whether or not the client continues the backup when it wakes up depends on the type of backend, as I mentioned above.

If you run a prune, restic will create an exclusive lock, which means that other clients may not write to the repository anymore. I’m not sure what happens if a client went to sleep and then wakes up and meanwhile there has been an exclusive lock created, but I would think that it detects this when attempting critical operations. Someone else will have to chime in though. Personally, I make sure that when I run prune, the client won’t be accessing the repo.

I wouldn’t be too worried, but instead of saying too much and potentially wrong things I hope someone else can chime in.

“Corrupt the repo” is a bit non-specific, so instead I will explain what is likely to happen instead.

Any packs that the backup client uploaded will have been removed by the prune, assuming that no other backups between the interrupted backup and the prune operation would have used those same blobs. The interrupted backup, assuming that restic actually continues with the backup instead of bailing, will be missing blobs and so the repository will fail restic check. In this scenario, forgetting the broken snapshot will fix the problem.

It is also likely that the client will add at least one index file that refers to blobs that were deleted by the prune operation. This could cause all sorts of problems, including clients deduplicating against blobs that don’t actually exist. I believe both restic check and restic prune will complain bitterly about this situation, and restic prune may fail entirely. restic rebuild-index would correct the problem.

3 Likes

Thanks, that makes sense. Duly noted!

I’m about to try to recover from such a scenario for second time: from putting my mac to sleep during a restic backup (with external HDD backend). The first time it wasn’t pretty so I don’t expect it to be any better this time around: the repository always gets corrupted.

Yeah restic check and restic prune are not happy. Last time I did restic rebuild-index it just turned a set of errors into a new set of errors. I had to do restic repair (a new command still under PR review). I guess I will have to do it again.

Which errors do you get exactly? Did you put your mac to sleep and then just let it wake up later with the hdd still attached? Or did you unplug the hdd in the meantime?

Sorry I didn’t keep a screenshot of errors. But I will next time because I often forget to kill restic before I put my mac to sleep, so this will happen again I’m sure.

Yes I did put my mac to sleep and just let it wake up without unplugging the HDD, but these are 2.5" portable HDDs that are powered only by that single USB connection, and what happens is that they go to sleep if there’s no activity for a long time. When I wake up the mac, macOS notifies me with the classic popups that say I should have “ejected” the disk before disconnecting them. So I assume the errors are the same as would occur if I disconnected a drive during a backup without ejecting.