Possibility of pausing, Risk of killing, and potential for resuming a forget and prune command?

mlissner · February 29, 2020, 3:24pm

I’ve been running a forget and prune command for about a week and it looks like we’re going to be doing some networking maintenance this weekend. Three questions:

Any way to pause this process and resume it later?
Assuming a pause/resume is impossible, will killing the command cause any kinds of issues at any point or is there a safe way to do so?
Is there anything I can do to salvage the week of work it’s already done if I have to kill it?

Here’s the current status:

158 snapshots have been removed, running prune
counting files in repo
building new index for repo
Remove(<lock/e5c559ef2a>) returned error, retrying after 708.613426ms: Delete: b2_delete_file_version: 401:
[74:08:42] 100.00%  2337523 / 2337523 packs
repository contains 2337523 packs (6331452 blobs) with 12.437 TiB
processed 6331452 blobs: 1205 duplicate blobs, 1.821 GiB duplicate
load all snapshots
find data that is still in use for 7 snapshots
[0:09] 100.00%  7 / 7 snapshots
found 464185 of 6331452 data blobs still in use, removing 5867267 blobs
will remove 0 invalid files
will delete 2126818 packs and rewrite 48062 packs, this frees 11.457 TiB
[17:11:53] 100.00%  48062 / 48062 packs rewritten
counting files in repo
[5:55:47] 100.00%  179479 / 179479 packs
finding old index files
saved new indexes as [278a4115 ae633cc0 acdc5e71 8e0ca6f5 9eaf48d9 9f8b3d14 0f7af28c fa8c1eb8 22f23d40 69d59870 eb66fdeb 40c9efdc c998c783 e40d98c9 108311bb 49a9efbc 7487b3a3 52da13c8 edaa3377 e05f0926 b6c7c100 afc5e1ed f97e0f85 156bcee1 092995de 900deb00 fb6faabe 55e6625e 62b5e9e8 1c58e90a 20100946 5f51edc7 94de6103 7213a8fb 7ba45c62 6daa7cf1 9c1756cc 3293114e 0b32a51f 3218dd27 5f9f7692 d8852160 52b83e03 19752999 e854d9d2 956db2fa f62697cb d3dd3609 8827a9d3 06d7437f 0bd449f2 fd77c0ca 41db9baf ff14ffcc 2ff8e0d5 333d3511 359be7d8 ecbfe72c 26f31138 0ce43382]
remove 3148 old index files
[12:32:33] 3.83% 83494 / 2174880 packs deleted

Thanks for any ideas.

cdhowie · March 1, 2020, 5:59am

If on Linux, you could try CTRL+Z and leaving the terminal session open. Later you can run fg to resume the process.

If any network activity is ongoing, restic should retry it.

You absolutely cannot run any operations against the repository while the prune operation is suspended. Data loss can result. Prune’s lock should still be in the repository, so you’ll be fine as long as neither you nor any of your backup scripts call restic unlock.

All restic commands are designed to leave the repository in a consistent state when killed. Usually, the worst-case scenario is the creation of some redundant data which a future prune operation will clean up.

mlissner · March 1, 2020, 6:25am

Thanks so much. This is really great information. I’ll report back about how CTRL+Z goes, though the latest from the hardware folks is that the network maintenance may be postponed.

nicnab · March 1, 2020, 10:22am

May I add taking a look at tmux as well? I use that whenever I do work via ssh and want to feel safe that a broken connection doesn’t affect my processes.