`Forget` instead of `rebuild-index`, then `recover`, then `prune`?

jim-collier · September 23, 2020, 7:27pm

Update: My assumptions in the numbered list below are faulty. The first comment states how. The question is still valid though, just not as apparently “simple”, and seems more likely to simply be, “it’s not possible”.

I have a wrapper script for unattended automation. Also, I want to prune now and then. (Common stuff.)

But what may not be very common, is that for unrelated requirements beyond the scope of this discussion, my uptime for this particular host is short enough that sometimes backups don’t finish. Even with careful selection of smaller sets of files. (TL;DR: Sometimes backups don’t complete, and even smaller backup sets aren’t the answer in this case, and having to occasionally do long-running rebuild-index and recover are perfectly acceptable costs.)

My script knows when to recover. And since I can’t afford the time to frequently troubleshoot and debug repositories, based on this suggestion, I’m willing to sacrifice the extended time to rebuild-index before recover, even for a low probability of it being necessary - for the tradeoff of less manual intervention. (And in this case, no backups run anyway on the same uptime cycle as a recover, or for a forget and prune - so time is practically a non-issue.)

On top of that, are these observations/facts/potentially erroneous assumptions:

Although not it’s primary purpose, restic forget seems to begin by rebuilding the index (as restic rebuild-index does?), at least if the output “building new index for repo” is accurate and I’m interpreting it correctly.
restic recover should, in some edge cases (according to link above), be preceded by restic rebuild-index. The latter is very expensive in time, but as mentioned previously, this isn’t a problem, even if it is a very low-probability necessity.

So since I need to periodically (forget and prune), AND (rebuild-index and recover), would it not make sense, in order to save time, to do this:

restic forget --keep-daily 8  # etc...also does same thing as rebuild-index
restic recover
restic prune

?

Thanks.

[Edits; Clarity, and update based on actual output.]

jim-collier · September 23, 2020, 9:37pm

Shoot, I think my logic is faulty. I think I got forget and prune backwards. The latter is the time-intensive one that also seems to rebuild the index, not the former. That throws a big spanner in my villainously brilliant scheme.

I’ll leave the question up though, in case anyone knows of a safe an effective way to combine the index-rebuilding of pruning, with the same thing done before recovery, in a way that doesn’t throw away aborted backups. In other words, something like:

restic rebuild-index     ## Very very time-consuming; sometimes helps next step
restic recover           ## Recover from previous aborted backup (quick)
restic forget ...        ## Do the initial work for pruning (quick)
restic prune             ## Very very time-consuming and also rebuilds index

But without doing the index-rebuilding twice, once at the start and again at the end.

rawtaz · September 24, 2020, 2:59pm

Does the following fact help in any way?

When a backup is interrupted, restic will have uploaded the data that it had time to upload, and the next time the backup runs, it will continue to upload the rest of the data (that has changed since the last successful backup/snapshot).

I am thinking that so what if your host sometimes doesn’t have time to upload all the changes… Imagine it managed to upload 80% of the data it was supposed to back up. Next time, if the remaining 20% is still relevant to back up, and perhaps there’s also some additional new data to back up since the previous backup attempt, it will try to upload that. Assuming it’s not always too short on time but in general have time to complete backups, won’t it be able to complete backups usually, sometimes with one snapshot missing from a run?

alexweiss · September 25, 2020, 5:14am

First, about rebuild-index:
After an aborted backup, some of your data is saved in data and tree pack files. Most of those packfiles are already correctly indexed in a saved index.
Yes - there might be some written pack files which are not yet contained in the index, but IMO you can as well neglect them as your data anyway is not complete. These few extra pack files will simply be removed by the next scheduled prune run. So, I would simply omit the rebuild-index step.

Then about recover:
This will in your case create a snapshot that contains the data of your aborted backup run. Do you need this snapshot for anything? It cannot be used to speed-up your next backup (as it is incomplete and restic automatically uses the last completed backup to find unchanged files quickly). So if you don’t use this snapshot for anything, I would also skip this step. However might be a good idea to “document” the aborted backup.

About forgetting and pruning:
No need to do this after your aborted backup. Your regularly scheduled runs will work fine to clean any left-overs. You might have to treat snapshots generated by repair (if you run it) separately in you regular forget.

TL;DR: IMO no need to manually “clean up” aborted backups - your scheduled forget/prune will do!

jim-collier · September 25, 2020, 3:11pm

Fantastic advice and explanation, thanks!

So related question - does restic prune indeed rebuild the index similar to how restic rebuild-index does? The reason I’m asking this now - even though I’m going to modify my scripted workflow based on your feedback - is that I manually started my script too late, before the next power cycle (which I can’t do anything about at least while it’s running). So the rebuild-index probably won’t finish before being forcefully terminated. On the next up cycle, would it be essentially the same and less redundant, to just start with a forget and prune and not do a rebuild-index?

rawtaz · September 25, 2020, 3:27pm

I have to ask. Are you backing up an orbiting space shuttle that only has communication at certain times during the day? That would be cool!

cdhowie · September 26, 2020, 1:27am

One thing to keep in mind is that, while prune rebuilds the index as the first step, it still (for some reason) uses the indexes that existed in the repository prior. This means that if a needed blob is present in the data directory but is not in an index, prune will fail when it looks for that blob.

github.com/restic/restic

Prune should either disregard repository indexes, or not rebuild indexes twice

opened 04:51PM - 29 Mar 19 UTC

closed 09:12AM - 05 Nov 20 UTC

cdhowie

Output of `restic version` -------------------------- ``` restic 0.9.4 compil…ed with go1.11.4 on linux/amd64 ``` How did you run restic exactly? ------------------------------- ``` restic -r . prune ``` What backend/server/service did you use to store the repository? ---------------------------------------------------------------- Local, though this does not appear to matter. Expected behavior ----------------- Prune completes successfully. Actual behavior --------------- Prune rebuilds the repository indexes in-memory but then appears to ignore these indexes, and tries to load the (missing) files from disk. When it gets to the garbage-collection phase, it therefore can't find any objects in the repository and fails. For example: ``` $ restic -r . prune enter password for repository: repository 26b51fe2 opened successfully, password is correct counting files in repo building new index for repo [0:00] 100.00% 67 / 67 packs repository contains 67 packs (4773 blobs) with 316.143 MiB processed 4773 blobs: 0 duplicate blobs, 0 B duplicate load all snapshots find data that is still in use for 1 snapshots tree 30649e56256d8418cd8b4f0c85f6ed048f6c4d04aa31e1fd28602df9956f125a not found in repository github.com/restic/restic/internal/repository.(*Repository).LoadTree /restic/internal/repository/repository.go:653 github.com/restic/restic/internal/restic.FindUsedBlobs /restic/internal/restic/find.go:11 main.pruneRepository /restic/cmd/restic/cmd_prune.go:191 main.runPrune /restic/cmd/restic/cmd_prune.go:85 main.glob..func18 /restic/cmd/restic/cmd_prune.go:25 github.com/spf13/cobra.(*Command).execute /restic/vendor/github.com/spf13/cobra/command.go:762 github.com/spf13/cobra.(*Command).ExecuteC /restic/vendor/github.com/spf13/cobra/command.go:852 github.com/spf13/cobra.(*Command).Execute /restic/vendor/github.com/spf13/cobra/command.go:800 main.main /restic/cmd/restic/main.go:86 runtime.main /usr/local/go/src/runtime/proc.go:201 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1333 ``` Running `restic rebuild-index` first makes prune succeed: ``` $ restic -r . rebuild-index enter password for repository: repository 26b51fe2 opened successfully, password is correct counting files in repo [0:00] 100.00% 67 / 67 packs finding old index files saved new indexes as [a13538cc] remove 0 old index files $ restic -r . prune enter password for repository: repository 26b51fe2 opened successfully, password is correct counting files in repo building new index for repo [0:00] 100.00% 67 / 67 packs repository contains 67 packs (4773 blobs) with 316.143 MiB processed 4773 blobs: 0 duplicate blobs, 0 B duplicate load all snapshots find data that is still in use for 1 snapshots [0:00] 100.00% 1 / 1 snapshots found 4773 of 4773 data blobs still in use, removing 0 blobs will remove 0 invalid files will delete 0 packs and rewrite 0 packs, this frees 0 B counting files in repo [0:00] 100.00% 67 / 67 packs finding old index files saved new indexes as [e13b3d29] remove 1 old index files done ``` I would suggest that restic should either use the indexes it generated instead of the repository indexes, or not generate the indexes that it does not appear to use. Steps to reproduce the behavior ------------------------------- 1. Create a repository. 2. Run a backup to create a snapshot. 3. Delete all of the files in the repository `index` directory. 4. Run `restic prune`.

jim-collier · September 27, 2020, 2:43pm

It’s a kludgey hardware solution involving powering everything off once a week, including three external array chasses, to work around what is ultimately a driver problem. But after fighting this particular array through 12 years of hardware problems including multiple server chasses and HBAs, always with some identifiable but highly unlikely problem [e.g. broken backplane pin, faulty fanout cable, etc.], I’m just happy to have a solution that - at least after the workarounds - is stable and reliable. With frequent extended power outages, I can’t rely on continuous uptime anyway…

alexweiss · September 28, 2020, 5:29pm

Usually, rebuild-index followed by prune should give the same result as prune alone - despite the cases of broken repos as @cdhowie pointed out.
The rule of thumb is: Only use rebuild-index manually to repair from a broken repo.

alexweiss · September 28, 2020, 6:06pm

BTW, I added the following issue:

I think if that feature would be included in restic, your original problem “how to handle interrupted backups” would be nicely handled by restic itself…

If you are keen, you can also try out my draft implementation: