Update: My assumptions in the numbered list below are faulty. The first comment states how. The question is still valid though, just not as apparently “simple”, and seems more likely to simply be, “it’s not possible”.
I have a wrapper script for unattended automation. Also, I want to prune now and then. (Common stuff.)
But what may not be very common, is that for unrelated requirements beyond the scope of this discussion, my uptime for this particular host is short enough that sometimes backups don’t finish. Even with careful selection of smaller sets of files. (TL;DR: Sometimes backups don’t complete, and even smaller backup sets aren’t the answer in this case, and having to occasionally do long-running rebuild-index and recover are perfectly acceptable costs.)
My script knows when to recover. And since I can’t afford the time to frequently troubleshoot and debug repositories, based on this suggestion, I’m willing to sacrifice the extended time to rebuild-index before recover, even for a low probability of it being necessary - for the tradeoff of less manual intervention. (And in this case, no backups run anyway on the same uptime cycle as a recover, or for a forget and prune - so time is practically a non-issue.)
On top of that, are these observations/facts/potentially erroneous assumptions:
Although not it’s primary purpose, restic forgetseems to begin by rebuilding the index (as restic rebuild-index does?), at least if the output “building new index for repo” is accurate and I’m interpreting it correctly.
restic recover should, in some edge cases (according to link above), be preceded by restic rebuild-index. The latter is very expensive in time, but as mentioned previously, this isn’t a problem, even if it is a very low-probability necessity.
So since I need to periodically (forget and prune), AND (rebuild-index and recover), would it not make sense, in order to save time, to do this:
restic forget --keep-daily 8 # etc...also does same thing as rebuild-index
[Edits; Clarity, and update based on actual output.]
Shoot, I think my logic is faulty. I think I got forget and prune backwards. The latter is the time-intensive one that also seems to rebuild the index, not the former. That throws a big spanner in my villainously brilliant scheme.
I’ll leave the question up though, in case anyone knows of a safe an effective way to combine the index-rebuilding of pruning, with the same thing done before recovery, in a way that doesn’t throw away aborted backups. In other words, something like:
restic rebuild-index ## Very very time-consuming; sometimes helps next step
restic recover ## Recover from previous aborted backup (quick)
restic forget ... ## Do the initial work for pruning (quick)
restic prune ## Very very time-consuming and also rebuilds index
But without doing the index-rebuilding twice, once at the start and again at the end.
When a backup is interrupted, restic will have uploaded the data that it had time to upload, and the next time the backup runs, it will continue to upload the rest of the data (that has changed since the last successful backup/snapshot).
I am thinking that so what if your host sometimes doesn’t have time to upload all the changes… Imagine it managed to upload 80% of the data it was supposed to back up. Next time, if the remaining 20% is still relevant to back up, and perhaps there’s also some additional new data to back up since the previous backup attempt, it will try to upload that. Assuming it’s not always too short on time but in general have time to complete backups, won’t it be able to complete backups usually, sometimes with one snapshot missing from a run?
First, about rebuild-index:
After an aborted backup, some of your data is saved in data and tree pack files. Most of those packfiles are already correctly indexed in a saved index.
Yes - there might be some written pack files which are not yet contained in the index, but IMO you can as well neglect them as your data anyway is not complete. These few extra pack files will simply be removed by the next scheduled prune run. So, I would simply omit the rebuild-index step.
Then about recover:
This will in your case create a snapshot that contains the data of your aborted backup run. Do you need this snapshot for anything? It cannot be used to speed-up your next backup (as it is incomplete and restic automatically uses the last completed backup to find unchanged files quickly). So if you don’t use this snapshot for anything, I would also skip this step. However might be a good idea to “document” the aborted backup.
About forgetting and pruning:
No need to do this after your aborted backup. Your regularly scheduled runs will work fine to clean any left-overs. You might have to treat snapshots generated by repair (if you run it) separately in you regular forget.
TL;DR: IMO no need to manually “clean up” aborted backups - your scheduled forget/prune will do!
So related question - does restic prune indeed rebuild the index similar to how restic rebuild-index does? The reason I’m asking this now - even though I’m going to modify my scripted workflow based on your feedback - is that I manually started my script too late, before the next power cycle (which I can’t do anything about at least while it’s running). So the rebuild-index probably won’t finish before being forcefully terminated. On the next up cycle, would it be essentially the same and less redundant, to just start with a forget and prune and not do a rebuild-index?
One thing to keep in mind is that, while prune rebuilds the index as the first step, it still (for some reason) uses the indexes that existed in the repository prior. This means that if a needed blob is present in the data directory but is not in an index, prune will fail when it looks for that blob.
It’s a kludgey hardware solution involving powering everything off once a week, including three external array chasses, to work around what is ultimately a driver problem. But after fighting this particular array through 12 years of hardware problems including multiple server chasses and HBAs, always with some identifiable but highly unlikely problem [e.g. broken backplane pin, faulty fanout cable, etc.], I’m just happy to have a solution that - at least after the workarounds - is stable and reliable. With frequent extended power outages, I can’t rely on continuous uptime anyway…
Usually, rebuild-index followed by prune should give the same result as prune alone - despite the cases of broken repos as @cdhowie pointed out.
The rule of thumb is: Only use rebuild-index manually to repair from a broken repo.