Aborted Backups

rumpelsepp · September 15, 2017, 8:24am

Hey,
what is the expected behaviour with aborted backups? I did an emergency backup because of a slowly dying disk and the backup did not finish properly; the backup repository was mounted over the network via samba. The machine crashed and restic was aborted somewhere in the backup.

Now my backup repository “contains errors”, reported by restic check and I get funny errors from restic check --read-data:

Load indexes
pack 6a10926c contained in several indexes: {1555f920 5b0e9da3}
pack aca007de contained in several indexes: {3b81e1c2 c17b8206}
pack d222ddc3 contained in several indexes: {97c20adb b2e37d4d}
pack 159c84bd contained in several indexes: {1555f920 491e8598}
pack b861683f contained in several indexes: {97c20adb acfa6371}
pack 81ee5f75 contained in several indexes: {1555f920 bbf080d9}
pack 4c42faf6 contained in several indexes: {3b81e1c2 e79891be}
pack d53165af contained in several indexes: {34037b5b 4e390097}
pack 3f777eea contained in several indexes: {1fe1ad72 df72f37f}
pack 4886e21f contained in several indexes: {3b81e1c2 6ed5c98f}
pack 3f3d05fd contained in several indexes: {1555f920 726a88d6}
pack 8e4d9859 contained in several indexes: {1555f920 160dd5c9}
pack 59c28fb0 contained in several indexes: {97c20adb f7fe36d6}
pack ed102a33 contained in several indexes: {1555f920 c341207d}

run `restic rebuild-index' to correct this
Check all packs

pack f59498c9c735e6c8a4657c8ee6187a35aa51cdda56765e8a65116ece30c3a10d: not referenced in any index
pack f6130612daaf2c028c41be31feb6ac5751848aa057eb18fc71924a30609a516b: not referenced in any index
pack f6e44a378765c51d86fd924aaf108d8d8500cbc77c984a8a18e5e515e085b327: not referenced in any index
pack f7ab80cf96ea341757697c41bab89891d8ff1e8eb34cfd8747a4df93bf279778: not referenced in any index
pack f7e11eef69122c7186ff0725e3830780534eed01b05621c5761166be6ae1c6d9: not referenced in any index
pack f8010097eaed35bd1308c3928c69b020820580e0b767ed9268c73894c106dda6: not referenced in any index
pack f852ea9396d972c2da8772fbbf77c307ef57eb262bc32e94ddf50841a8b9dabf: not referenced in any index
pack f86ebf697f81a5e8c27ed3550cc4e51374c0483c4a4aa9f0e825efc255c21619: not referenced in any index
pack f91faf1abc40d8be54f8ea2764540a889affe6e7653abf8eee41e02a244c9c18: not referenced in any index
pack f923b50618891f86ff05001644d79e72c05e0feb16eabf799389ac40374c90b1: not referenced in any index
pack fa0d0d9889647e7fe9d9a976b828e858f653af784267e6e4f9ebc21ed0be2075: not referenced in any index
pack fb9e7d01f0a4715a5a0e88cb517574044dcdcb7afe6572411c75a604dcc66207: not referenced in any index
pack fc1d243dfae52c7235fbb54fac7b44bb2a66d19425460bbe66ba9a5e7ddec0ae: not referenced in any index
pack fc29f36b1ebd34af7b46cfd0239f7340c87c94200c1321dc68eac4a92bd6d58c: not referenced in any index
pack fc61e643341fa0ba4798042189dd25fb362358da2de1b9d17b96e09aef691ab1: not referenced in any index
pack fcaa2a2792dc49f2c8306fbb7556c718ed3cbc8cd473c6b1be80bb9f42e57614: not referenced in any index
pack fcceea2adcc87df153b85bdec42fc5c708f8ee8dfe54edaba28e8c8eae5b7dd0: not referenced in any index
pack fe78351c38005ae2f2c56c85dc08f92211b577bc45d75d2dfde3f1bc337a82c6: not referenced in any index
pack ffe4a50c97cc86c99845628462e9b7dd5b816800e25c22e4ca25f79f07a14d1f: not referenced in any index

Pack ID does not match, want 091f24d0, got ca215008
Pack ID does not match, want 0b574ea7, got ce8c859e
Pack ID does not match, want 0d321325, got 21a00940
Pack ID does not match, want 1094c879, got 9a84fea8
Pack ID does not match, want 1360058e, got 3cbe8bc8
Pack ID does not match, want 1618667c, got 43009645
Pack ID does not match, want 1b8e392a, got e3b0c442
Pack ID does not match, want 21b4f657, got bf284e29

How can I deal with these? Is restic able to detect unfinished backups and somehow reap orphaned blocks automatically? Thanks!

[2:34:07] 100.00%  36531 / 36049 items
duration: 2:34:07
Fatal: repository contains errors

fd0 · September 15, 2017, 6:28pm

The errors are roughly what I’d expect from an aborted backup. Mostly, the errors are about too much data that isn’t referenced, because the backup did not finish, so no snapshot was written.

If you haven’t finished the backup of your hard disk yet, run restic rebuild-index followed by a new backup. This will re-use already uploaded data. Then afterwards run restic prune to get rid of all the unnecessary data.

If you have already finished the backup, just run restic prune.

rumpelsepp · September 15, 2017, 9:43pm

Thanks for your answer. I think it is a bit wonky, how restic currently handles aborted backups. I have a file in data which is of length 0 (maybe cause the backup was aborted at this file). restic panics when it wants to read this particular file during a new backup:

$ ls -lah data/b8/b882c42db46714687f43e73ce23728a13f7b98bc2260c51ab918bd876313d886
-rwx------ 1 stefan stefan 0 Sep 15 22:55 data/b8/b882c42db46714687f43e73ce23728a13f7b98bc2260c51ab918bd876313d886*

panic: OpenFile: open data/b8/b882c42db46714687f43e73ce23728a13f7b98bc2260c51ab918bd876313d886: input/output error

goroutine 120244 [running]:
github.com/restic/restic/src/restic/archiver.(*Archiver).saveChunk(0xc4315d0b40, 0xc31700, 0xc420014488, 0xb73d05a1, 0x286dab, 0x7b185b7700000, 0xc454cc8000, 0x286dab, 0x4b0000, 0xc43158adc0, ...)
	/build/restic-hSTbw5/restic-0.7.1/obj-x86_64-linux-gnu/src/github.com/restic/restic/src/restic/archiver/archiver.go:165 +0x395
created by github.com/restic/restic/src/restic/archiver.(*Archiver).SaveFile
	/build/restic-hSTbw5/restic-0.7.1/obj-x86_64-linux-gnu/src/github.com/restic/restic/src/restic/archiver/archiver.go:239 +0x3d7

After removing that file I was able to finish my backup and do restic prune to get rid of the orphaned data.

borg handles this case better, as it creates checkpoints when writing backups. A checkpoint is some flag which indicates that all data written before (in the meaning of time) the checkpoint is valid. All data written after the checkpoint will be dropped as it is not guaranteed that the data was actually written to disk properly. The tradeoff is, that some data need to be copied again when resuming the backup.

Have I discovered a bug, or in other words, is such a checkpoint concept implemented in restic?

fd0 · September 16, 2017, 8:33am

Oh? That is very unusual, I’ll try to reproduce it. The questions that popped up in my mind are: Why would restic read that file during backup? What did you do exactly, where is the repo saved, how did you start restic?

Just to clarify: You observed the stack trace and then the empty file was there? Or was it the other way around? When you restart the backup, will it abort again with the same file?

There is a similar concept in restic, but I need to explain a bit of background first. When restic backup starts it reads two things from the repo: The index files (telling it which blobs of data are stored within the files in the data/ directory) and the list of snapshots. With the latter it tries to detect if the same set of files/dirs have already been saved before. If that’s the case, then it downloads the metadata to avoid reading unchanged files again. Blobs can either be parts of files or directory metadata information (as a JSON document). For details please see the design document.

In this case, there was no complete backup before, so the snapshots part is not relevant. But restic also downloads the index files, and will then start reading the files/dirs to save. If it encounters a blob that is already saved (remember, that information is contained in the index file) it won’t save this particular blob again.

So in order to resume interrupted backups, restic regularly writes intermediate index files that contains listings of all data files that have successfully been uploaded since the last index file was written. At least every five minutes such an intermediate index is saved in the repo. That’s a (very basic) checkpoint concept.

I hope my text is understandable, feel free to add further questions

rumpelsepp · September 17, 2017, 4:28pm

Since I am have not (yet) digged deep in the restic source code, I cannot evaluate the situation that happened completely. What I did was basically this:

I have a NAS that I can access via samba within my own VPN infrastructure.
The restic backup repo is located at that particular samba share and I was doing a backup of my laptop.
Because of some reason (I don’t remember, as the backup was in the background hidden somewhere in tmux) the backup did not finish.
Since the first time I wanted to resume the backup, the empty file was there and restic repeatedly crashed with the said stacktrace. I don’t know why it was trying to read the file.
The commandline of restic was nothing more than restic -r REPO backup /home/stefan, the repo, as already mentioned, was located at a samba share within my network.

Unfortunately I have already deleted the file and I have done restic prune. But I could not use the repository to store a further backup until I deleted that zero length file; I figure that it was the same file all the time.

The only additional thing to restic that might have created that zero length file is samba. We should find out what happens when you create and write to a new file and while you disconnect your network connection.

Thanks for that clarification! I see that for my usecase restic is a very cool tool that fits my needs.

Does it detect corrupted blobs during a backup? I guess it doesn’t, since the whole blob must be read to calculate the hash value. Checking the file metadata (filesisze, …) as a poor-man’s corruption check might be worth it, isn’t it?

fd0 · September 18, 2017, 8:49am

No, it doesn’t.

Hm, I’m not entirely sure what you mean by that. At the moment, the backend interface we’re using internally does not expose any other meta data than file size.

I have a hypothesis how the situation you observed may have happened:

You have the smb share of your NAS mounted somewhere
You make a backup with restic
It happily writes files, writes a pack file and then an intermediate index.
At that point, the connection to the smb server fails: The intermediate index gets written, the pack file not (but an empty file, let’s call it A, is left behind)
The next run of restic backup reads the index files including the intermediate one which describes the contents of the file A. So all the blobs that are contained in the file A (according to the index) are not written again. The backup finishes. Among others, a directory contents (metadata) is contained in the file.
On the next backup, restic detects a parent snapshot (the directory has been saved before) and tries to load the metadata from the repo, which includes reading file A. That does not work and it aborts.

I admit this is rather far fetched, but at the moment I don’t see any other way how this may have happened.

rumpelsepp · September 18, 2017, 9:02am

I lack knowledge about the internal workings of restic, so I better read the code first before making suggestions.

I will keep this in mind; when it happens again, I will try to reproduce it. Thanks for your answers so far!