Corrupted repo or error?

Hello,

I’m experiencing some errors after trying to implement a forget then prune policy.

I have three hosts making backups on the same repo on Wasabi.

Short version of the story is I did a forget with --prune but it fired me an internal error. I tried several things (detailed here after) but I’m always facing errors.

Long version is :

I first tried forget with --dry-run and then without, but with --prune. Result is the following error :

.
.
.
11144 snapshots have been removed, running prune
counting files in repo
building new index for repo
[3:54:56] 100.00% 328852 / 328852 packs
repository contains 328852 packs (6814589 blobs) with 1.326 TiB
processed 6814589 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 3530 snapshots
tree 8c1934d697c83d8abf0f30c8abaab3913f923e8bf3e90d9c3e12fea66af55dd3 not found in repository
github.com/restic/restic/internal/repository.(*Repository).LoadTree
internal/repository/repository.go:709
github.com/restic/restic/internal/restic.FindUsedBlobs
internal/restic/find.go:11
github.com/restic/restic/internal/restic.FindUsedBlobs
internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
internal/restic/find.go:31
main.pruneRepository
cmd/restic/cmd_prune.go:191
main.runForget
cmd/restic/cmd_forget.go:215
main.glob…func10
cmd/restic/cmd_forget.go:22
github.com/spf13/cobra.(*Command).execute
vendor/github.com/spf13/cobra/command.go:762
github.com/spf13/cobra.(*Command).ExecuteC
vendor/github.com/spf13/cobra/command.go:852
github.com/spf13/cobra.(*Command).Execute
vendor/github.com/spf13/cobra/command.go:800
main.main
cmd/restic/main.go:86
runtime.main
/usr/local/Cellar/go/1.12.7/libexec/src/runtime/proc.go:200
runtime.goexit
/usr/local/Cellar/go/1.12.7/libexec/src/runtime/asm_amd64.s:1337

I must confess, the first time I asked, restic told me the repo was locked so I did a restic unlock.

For my second attempt as a restic snapshots tells me only 3535 snapshots I assumed the forget was done and asked only for prune.

Exact same result, except for the number of packs and blobs.

$ restic prune
repository 2c128c6e opened successfully, password is correct
counting files in repo
building new index for repo
[2:20:58] 100.00% 328999 / 328999 packs
repository contains 328999 packs (6826272 blobs) with 1.327 TiB
processed 6826272 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 3535 snapshots
tree 8c1934d697c83d8abf0f30c8abaab3913f923e8bf3e90d9c3e12fea66af55dd3 not found in repository
.
. (same lines as above)
.

The I thought I should rebuild-index as something could have change in the repo (because of other hosts doing their backup on regular basis).

$ restic rebuild-index
repository 2c128c6e opened successfully, password is correct
counting files in repo
[4:09:33] 100.00% 329146 / 329135 packs
finding old index files
saved new indexes as [53aba978 fb92cdad 8902ebd6 795f969d 9395cb59 8fc3d052 5661d0bb 313c03f1 2d417db0 97dd1a9c 00dd982d 08d90e9b fadd461a f9d193ad 93b3bac0 63b18c8f d1f12f9f a955c082 40db86b1 cc01b780 2389312e 7389961c 32a95083 6208aaa5 39587658 05b8d869 08770d4a f6befa1d f2e5f94e 87b0fd2c 6801181a 539dddb2 958c9a93 4f461273 54931bee d55eefda 203f2802 619ed7ce 5e2e6038 5de54cfd 68a59b71 e575865b ceac7e42 ccd95271 46ec1caf 16f1ac47 fa07b03d 688f3264 55f707bd 0e6befe7 b0894998 fd624c7e 9099fc3e 9a49e691 1c3ac5a3 1c60b115 2660ad53 60546483 84a96361 08631019 9b5916f1 1cd5f0e5 6852f900 38455df1 8c24361a 269975cf 4b427f9c ecb51f3c 946d1e4a 3e3cd240 c8b13221 a1cf21cb 7839c51f ec16c59c cbcdcb30 60f3ee64 47f68fd8 26a4056f d1f41529 ccc974d2 e6dfcbae 3c3e4c6c b2405281 59e996b1 c0ad2a5a 2518a66f ce03f221 7a27110e 30366121 d17a3d57 f571c7bd d0a00993 87a6a466 428c00ad 6db77f14 9b2e30c4 fd39f095 3a26c265 fe79352f 32f4a51a 823bc4bc f5b7e8f2 b642e45b 5ce04b8e 071e9f66 b1b18899 cf970f29 d80d363a 37a8c4ee ddd791a1]
remove 144 old index files

Success for the rebuild-index. Time to prune.

$ restic prune
repository 2c128c6e opened successfully, password is correct
Fatal: unable to create lock in backend: repository is already locked by PID 32292 on MacBookNK by root (UID 0, GID 80)
lock was created at 2019-08-16 04:49:04 (19h47m15.267159s ago)
storage ID 12119b1c

$ restic unlock
repository 2c128c6e opened successfully, password is correct
successfully removed locks

$ restic prune
repository 2c128c6e opened successfully, password is correct
counting files in repo
building new index for repo
pack file cannot be listed 9a298794fce6e71d2e1674e59a2d4fa410b4b3a2434535c1141b1b27894858c4: The difference between the request time and the current time is too large.
pack file cannot be listed d3835abd126563e8baf93c1e4b71ec6b541ba0ecaccc7227524b112163d60712: The difference between the request time and the current time is too large.
pack file cannot be listed d2ddb6e706b3ea25f06f25360da74666972b517a3d4acde352ec8c88e2fae605: The difference between the request time and the current time is too large.
pack file cannot be listed b1b9e7ac866b739eb230bf5b500e73eb1c297c8c49953c2e641d9a82154abe3c: The difference between the request time and the current time is too large.
pack file cannot be listed f816ab7be75a0a308a8370138b2a0653f43b22c9067d2dff398b225e651e0112: The difference between the request time and the current time is too large.
Save(<lock/aa78282df5>) returned error, retrying after 629.765242ms: client.PutObject: 503 Service Unavailable
Save(<lock/53a24fd69e>) returned error, retrying after 549.145245ms: client.PutObject: S3 service is temporarily not available. Please try again later.
pack file cannot be listed f82056aca3edbd4a9213fb9531b0005cde983f538a056ad66eb97665ed3dfc32: S3 service is temporarily not available. Please try again later.
[4:59:41] 100.00% 329163 / 329163 packs
repository contains 329157 packs (6830437 blobs) with 1.327 TiB
processed 6830437 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
Save(<lock/53a24fd69e>) returned error, retrying after 993.847211ms: client.PutObject: S3 service is temporarily not available. Please try again later.
List(snapshot) returned error, retrying after 716.033122ms: S3 service is temporarily not available. Please try again later.
Save(<lock/53a24fd69e>) returned error, retrying after 565.032498ms: client.PutObject: S3 service is temporarily not available. Please try again later.
List(snapshot) returned error, retrying after 827.170457ms: S3 service is temporarily not available. Please try again later.
Save(<lock/53a24fd69e>) returned error, retrying after 1.338257943s: client.PutObject: S3 service is temporarily not available. Please try again later.
List(snapshot) returned error, retrying after 592.052811ms: S3 service is temporarily not available. Please try again later.
Save(<lock/53a24fd69e>) returned error, retrying after 1.823281674s: client.PutObject: S3 service is temporarily not available. Please try again later.
Save(<lock/53a24fd69e>) returned error, retrying after 3.358779062s: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/53a24fd69ef7e79802722aa2bb4835e83e6cbd4d6acac99e48a116cee9ddf449: dial tcp 38.27.106.14:443: connect: connection refused
List(snapshot) returned error, retrying after 1.515055597s: Get https://s3.wasabisys.com/moutiers-backup/?delimiter=%2F&max-keys=1000&prefix=snapshots%2F: dial tcp 38.27.106.16:443: connect: connection refused
Save(<lock/53a24fd69e>) returned error, retrying after 4.50738763s: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/53a24fd69ef7e79802722aa2bb4835e83e6cbd4d6acac99e48a116cee9ddf449: dial tcp 38.27.106.21:443: connect: connection refused
List(snapshot) returned error, retrying after 1.902652441s: Get https://s3.wasabisys.com/moutiers-backup/?delimiter=%2F&max-keys=1000&prefix=snapshots%2F: dial tcp 38.27.106.19:443: connect: connection refused
List(snapshot) returned error, retrying after 2.789236966s: Get https://s3.wasabisys.com/moutiers-backup/?delimiter=%2F&max-keys=1000&prefix=snapshots%2F: dial tcp 38.27.106.21:443: connect: connection refused
Save(<lock/53a24fd69e>) returned error, retrying after 5.819736323s: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/53a24fd69ef7e79802722aa2bb4835e83e6cbd4d6acac99e48a116cee9ddf449: dial tcp 38.27.106.16:443: connect: connection refused
Save(<lock/53a24fd69e>) returned error, retrying after 13.571988978s: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/53a24fd69ef7e79802722aa2bb4835e83e6cbd4d6acac99e48a116cee9ddf449: dial tcp 38.27.106.21:443: connect: connection refused
List(snapshot) returned error, retrying after 4.112046883s: Get https://s3.wasabisys.com/moutiers-backup/?delimiter=%2F&max-keys=1000&prefix=snapshots%2F: dial tcp 38.27.106.19:443: connect: connection refused
List(snapshot) returned error, retrying after 12.437916593s: Get https://s3.wasabisys.com/moutiers-backup/?delimiter=%2F&max-keys=1000&prefix=snapshots%2F: dial tcp 38.27.106.15:443: connect: connection refused
unable to refresh lock: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/53a24fd69ef7e79802722aa2bb4835e83e6cbd4d6acac99e48a116cee9ddf449: dial tcp 38.27.106.13:443: connect: connection refused
List(snapshot) returned error, retrying after 11.334192319s: Get https://s3.wasabisys.com/moutiers-backup/?delimiter=%2F&max-keys=1000&prefix=snapshots%2F: dial tcp 38.27.106.14:443: connect: connection refused
Save(<lock/7bce7b5460>) returned error, retrying after 323.666253ms: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/7bce7b54606f31ed018af8d15b230a8f690f115df39d7772b8ed72858be57b99: dial tcp 38.27.106.13:443: connect: connection refused
Save(<lock/7bce7b5460>) returned error, retrying after 1.024445339s: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/7bce7b54606f31ed018af8d15b230a8f690f115df39d7772b8ed72858be57b99: dial tcp 38.27.106.21:443: connect: connection refused
.
.
.
unable to refresh lock: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/7bce7b54606f31ed018af8d15b230a8f690f115df39d7772b8ed72858be57b99: dial tcp 38.27.106.21:443: connect: connection refused
Save(<lock/f86c9abd8e>) returned error, retrying after 328.319146ms: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/f86c9abd8e5404ffe627a27b2f5f29357bcfab9a277aae89cbde711a6ea83693: dial tcp 38.27.106.15:443: connect: connection refused
.
.
.
Save(<lock/f86c9abd8e>) returned error, retrying after 7.634295504s: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/f86c9abd8e5404ffe627a27b2f5f29357bcfab9a277aae89cbde711a6ea83693: dial tcp 38.27.106.16:443: connect: connection refused
unable to refresh lock: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/f86c9abd8e5404ffe627a27b2f5f29357bcfab9a277aae89cbde711a6ea83693: dial tcp 38.27.106.14:443: connect: connection refused
Remove(<lock/aa78282df5>) returned error, retrying after 566.074862ms: client.RemoveObject: Delete https://s3.wasabisys.com/moutiers-backup/locks/aa78282df553d228aaa7337917ff7cfbabf8976bf1b930af8c5003f666b8dd3f: dial tcp 38.27.106.21:443: connect: connection refused
.
.
.
Remove(<lock/aa78282df5>) returned error, retrying after 13.652996146s: client.RemoveObject: Delete https://s3.wasabisys.com/moutiers-backup/locks/aa78282df553d228aaa7337917ff7cfbabf8976bf1b930af8c5003f666b8dd3f: dial tcp 38.27.106.21:443: connect: connection refused
Get https://s3.wasabisys.com/moutiers-backup/?delimiter=%2F&max-keys=1000&prefix=snapshots%2F: dial tcp 38.27.106.15:443: connect: connection refused
Save(<lock/03cdf3dd81>) returned error, retrying after 341.299466ms: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/03cdf3dd81419fe5d7cddc4fc0a0bec374c29f7d3f8d96111cb9848461ead1b2: dial tcp 38.27.106.16:443: connect: connection refused
.
.
.
Save(<lock/03cdf3dd81>) returned error, retrying after 11.762339209s: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/03cdf3dd81419fe5d7cddc4fc0a0bec374c29f7d3f8d96111cb9848461ead1b2: dial tcp 38.27.106.15:443: connect: connection refused
unable to refresh lock: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/03cdf3dd81419fe5d7cddc4fc0a0bec374c29f7d3f8d96111cb9848461ead1b2: dial tcp 38.27.106.21:443: connect: connection refused
Save(<lock/25ee8d0aba>) returned error, retrying after 679.73749ms: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/25ee8d0aba336239ec7b999288fcc7cf37cd2938dcdceffbdb820c1234e1ada8: dial tcp 38.27.106.16:443: connect: connection refused
Save(<lock/25ee8d0aba>) returned error, retrying after 844.837082ms: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/25ee8d0aba336239ec7b999288fcc7cf37cd2938dcdceffbdb820c1234e1ada8: dial tcp 38.27.106.19:443: connect: connection refused
Save(<lock/25ee8d0aba>) returned error, retrying after 612.979784ms: client.PutObject: Put https://s3.wasabisys.com/moutiers-backup/locks/25ee8d0aba336239ec7b999288fcc7cf37cd2938dcdceffbdb820c1234e1ada8: dial tcp 38.27.106.13:443: connect: connection refused

I thought this was because of a bad restic install or upgrade, so I resintalled it (using brew on Mac).

$ resic prune
repository 2c128c6e opened successfully, password is correct
counting files in repo
building new index for repo
[3:23:21] 100.00% 329357 / 329357 packs
repository contains 329357 packs (6833616 blobs) with 1.328 TiB
processed 6833616 blobs: 2 duplicate blobs, 1.835 KiB duplicate
load all snapshots
find data that is still in use for 3561 snapshots
tree 6f512c9829ffe2ba0541463a5b53c8c8863c8166c41fc97f821f4d2ecff76220 not found in repository
.
. (same lines as above)
.
vendor/github.com/spf13/cobra/command.go:762
github.com/spf13/cobra.(*Command).ExecuteC
vendor/github.com/spf13/cobra/command.go:852
github.com/spf13/cobra.(*Command).Execute
vendor/github.com/spf13/cobra/command.go:800
main.main
cmd/restic/main.go:86
runtime.main
/usr/local/Cellar/go/1.12.9/libexec/src/runtime/proc.go:200
runtime.goexit
/usr/local/Cellar/go/1.12.9/libexec/src/runtime/asm_amd64.s:1337

Still the same error even if the missing tree is not the same.
I thought the important line here was

tree 6f512c9829ffe2ba0541463a5b53c8c8863c8166c41fc97f821f4d2ecff76220 not found in repository

I found the snapshot it belongs to and forget it, but it was no use, I ended with new kind of errors :

$ restic prune

repository 2c128c6e opened successfully, password is correct
counting files in repo
building new index for repo
Load(<data/30947bed3c>, 591, 5228919) returned error, retrying after 365.915021ms: We encountered an internal error. Please retry the operation again later.
Load(<data/30947bed3c>, 591, 5228919) returned error, retrying after 852.483161ms: We encountered an internal error. Please retry the operation again later.
Load(<data/30947bed3c>, 591, 5228919) returned error, retrying after 828.368408ms: We encountered an internal error. Please retry the operation again later.
Load(<data/30947bed3c>, 591, 5228919) returned error, retrying after 875.32856ms: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed 30947bed3cc85e8d0044a2dee0a7b1ff5e70903be13f90fee34a9d7e38002fc5: We encountered an internal error. Please retry the operation again later.
Load(<data/443772530a>, 591, 1130991) returned error, retrying after 340.795423ms: We encountered an internal error. Please retry the operation again later.
Load(<data/456c0865eb>, 591, 4349288) returned error, retrying after 407.680991ms: We encountered an internal error. Please retry the operation again later.
Load(<data/443772530a>, 591, 1130991) returned error, retrying after 567.905668ms: We encountered an internal error. Please retry the operation again later.
Load(<data/456c0865eb>, 591, 4349288) returned error, retrying after 1.049528599s: We encountered an internal error. Please retry the operation again later.
Load(<data/443772530a>, 591, 1130991) returned error, retrying after 811.123708ms: We encountered an internal error. Please retry the operation again later.
Load(<data/456c0865eb>, 591, 4349288) returned error, retrying after 1.349762785s: We encountered an internal error. Please retry the operation again later.
Load(<data/443772530a>, 591, 1130991) returned error, retrying after 2.522148245s: We encountered an internal error. Please retry the operation again later.
Load(<data/456c0865eb>, 591, 4349288) returned error, retrying after 1.948015446s: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed 443772530a7159da23db79544124abb217daba7cc70ed4cf02486a7a0f0c9c8a: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed 456c0865ebd993f0edc37d9196d4cc9628c7f8097ee4cf2f5634f1e31d1c60b4: We encountered an internal error. Please retry the operation again later.
Load(<data/63e113b208>, 591, 6597362) returned error, retrying after 641.736276ms: We encountered an internal error. Please retry the operation again later.
Load(<data/63e113b208>, 591, 6597362) returned error, retrying after 1.095321373s: We encountered an internal error. Please retry the operation again later.
Load(<data/63e113b208>, 591, 6597362) returned error, retrying after 910.675768ms: We encountered an internal error. Please retry the operation again later.
Load(<data/63e113b208>, 591, 6597362) returned error, retrying after 1.544194112s: We encountered an internal error. Please retry the operation again later.
Load(<data/711d50e6c7>, 591, 194570) returned error, retrying after 705.254172ms: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed 63e113b2084c2bbb7f2458e911f8aaaab484b78f4935d84b0341c273eecf4cad: We encountered an internal error. Please retry the operation again later.
Load(<data/711d50e6c7>, 591, 194570) returned error, retrying after 598.323655ms: We encountered an internal error. Please retry the operation again later.
.
Load(<data/711d50e6c7>, 591, 194570) returned error, retrying after 1.757765324s: We encountered an internal error. Please retry the operation again later.
Load(<data/7b44c2ceea>, 591, 1150020) returned error, retrying after 384.282825ms: We encountered an internal error. Please retry the operation again later.
.
.
.
Load(<data/7b44c2ceea>, 591, 1150020) returned error, retrying after 1.720143672s: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed 7b44c2ceeac1edfd624def884c4e731d264c50db8e11ffb59c24d9002319bc50: We encountered an internal error. Please retry the operation again later.
Load(<data/8e2e1c64da>, 591, 6607109) returned error, retrying after 516.850733ms: We encountered an internal error. Please retry the operation again later.
.
Load(<data/8e2e1c64da>, 591, 6607109) returned error, retrying after 811.451062ms: We encountered an internal error. Please retry the operation again later.
Load(<data/999d75fb07>, 591, 110694) returned error, retrying after 546.451702ms: We encountered an internal error. Please retry the operation again later.
Load(<data/99e72035f3>, 591, 4314397) returned error, retrying after 577.429652ms: We encountered an internal error. Please retry the operation again later.
Load(<data/8e2e1c64da>, 591, 6607109) returned error, retrying after 1.632131812s: We encountered an internal error. Please retry the operation again later.
Load(<data/999d75fb07>, 591, 110694) returned error, retrying after 589.514858ms: We encountered an internal error. Please retry the operation again later.
Load(<data/99e72035f3>, 591, 4314397) returned error, retrying after 887.074907ms: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed 8e2e1c64dade31a9f983117ac3ab388a7357c233696ff6fdd74f33aee317df2f: We encountered an internal error. Please retry the operation again later.
Load(<data/999d75fb07>, 591, 110694) returned error, retrying after 1.637203463s: We encountered an internal error. Please retry the operation again later.
.
Load(<data/999d75fb07>, 591, 110694) returned error, retrying after 1.427447548s: We encountered an internal error. Please retry the operation again later.
Load(<data/99e72035f3>, 591, 4314397) returned error, retrying after 1.478285024s: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed 999d75fb0774ec275ecddda364b1b07489cdfd6c69fe51cb4106655da1455558: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed 99e72035f3aa78629406016c421af67f1d6c9e617f5b013456ff34aaabc26563: We encountered an internal error. Please retry the operation again later.
Load(<data/a89e26a216>, 591, 6006327) returned error, retrying after 546.097183ms: We encountered an internal error. Please retry the operation again later.
.
Load(<data/b0b0ec5ec5>, 591, 4753367) returned error, retrying after 705.309796ms: We encountered an internal error. Please retry the operation again later.
Load(<data/a89e26a216>, 591, 6006327) returned error, retrying after 913.469198ms: We encountered an internal error. Please retry the operation again later.
Load(<data/b0b0ec5ec5>, 591, 4753367) returned error, retrying after 496.945737ms: We encountered an internal error. Please retry the operation again later.
Load(<data/a89e26a216>, 591, 6006327) returned error, retrying after 1.970590135s: We encountered an internal error. Please retry the operation again later.
Load(<data/b0b0ec5ec5>, 591, 4753367) returned error, retrying after 1.115659881s: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed a89e26a21630bdaabdc0e7b91ad340a93b6d8a3e4c6b99361920d013db099b2d: We encountered an internal error. Please retry the operation again later.
Load(<data/b0b0ec5ec5>, 591, 4753367) returned error, retrying after 1.116190312s: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed b0b0ec5ec550a9eafb6ffc2771838070cb7e217187d78f9acbb555355fa2d2c0: We encountered an internal error. Please retry the operation again later.
Load(<data/bcfc57b59b>, 591, 4194301) returned error, retrying after 321.669734ms: We encountered an internal error. Please retry the operation again later.
.
Load(<data/bcfc57b59b>, 591, 4194301) returned error, retrying after 563.521672ms: We encountered an internal error. Please retry the operation again later.
Load(<data/c3da206bcc>, 591, 4208495) returned error, retrying after 513.256679ms: We encountered an internal error. Please retry the operation again later.
Load(<data/bcfc57b59b>, 591, 4194301) returned error, retrying after 2.035651671s: We encountered an internal error. Please retry the operation again later.
Load(<data/c3da206bcc>, 591, 4208495) returned error, retrying after 1.059857018s: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed bcfc57b59b13ac60cfffd5e3884e50250fcf296bb15338e7c6d698d0661cfb3e: We encountered an internal error. Please retry the operation again later.
Load(<data/c3da206bcc>, 591, 4208495) returned error, retrying after 1.017903255s: We encountered an internal error. Please retry the operation again later.
Load(<data/ca9b78908e>, 591, 4250865) returned error, retrying after 650.905071ms: We encountered an internal error. Please retry the operation again later.
Load(<data/c3da206bcc>, 591, 4208495) returned error, retrying after 2.315261713s: We encountered an internal error. Please retry the operation again later.
Load(<data/ca9b78908e>, 591, 4250865) returned error, retrying after 446.695428ms: We encountered an internal error. Please retry the operation again later.
Load(<data/ce5df1795e>, 591, 2391991) returned error, retrying after 597.774193ms: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed c3da206bcc376456f6715018973d566ba09bbc485c84d0aac281a2784a6903fd: We encountered an internal error. Please retry the operation again later.
Load(<data/ca9b78908e>, 591, 4250865) returned error, retrying after 826.745247ms: We encountered an internal error. Please retry the operation again later.
Load(<data/ce5df1795e>, 591, 2391991) returned error, retrying after 698.682548ms: We encountered an internal error. Please retry the operation again later.
Load(<data/d1a6b23d97>, 591, 4473004) returned error, retrying after 318.830517ms: We encountered an internal error. Please retry the operation again later.
Load(<data/ca9b78908e>, 591, 4250865) returned error, retrying after 2.530283976s: We encountered an internal error. Please retry the operation again later.
Load(<data/ce5df1795e>, 591, 2391991) returned error, retrying after 989.543237ms: We encountered an internal error. Please retry the operation again later.
Load(<data/d1a6b23d97>, 591, 4473004) returned error, retrying after 394.426401ms: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed ca9b78908e93e324036b2413e45974b5b5abae3f444fa6d569113f60acf31c7c: We encountered an internal error. Please retry the operation again later.
Load(<data/ce5df1795e>, 591, 2391991) returned error, retrying after 1.35343545s: We encountered an internal error. Please retry the operation again later.
Load(<data/d1a6b23d97>, 591, 4473004) returned error, retrying after 1.404569513s: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed ce5df1795e12fa10eaa44ef365da8e0444c106272eca81c699f5a0b36e80d41d: We encountered an internal error. Please retry the operation again later.
Load(<data/d1a6b23d97>, 591, 4473004) returned error, retrying after 1.007296947s: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed d1a6b23d97a1bd782a1c4470e2498dcece599fb8614dee415d12e9c2d6bb48e2: We encountered an internal error. Please retry the operation again later.
Load(<data/f917f0decf>, 591, 798703) returned error, retrying after 734.540472ms: We encountered an internal error. Please retry the operation again later.
Load(<data/f917f0decf>, 591, 798703) returned error, retrying after 941.457138ms: We encountered an internal error. Please retry the operation again later.
Load(<data/f917f0decf>, 591, 798703) returned error, retrying after 1.046425036s: We encountered an internal error. Please retry the operation again later.
Load(<data/f917f0decf>, 591, 798703) returned error, retrying after 1.447923306s: We encountered an internal error. Please retry the operation again later.
pack file cannot be listed f917f0decf8eda7feb40e07353addde4bb598d5d3b748153944b9343bba25f94: We encountered an internal error. Please retry the operation again later.
[4:03:17] 100.00% 329622 / 329622 packs
repository contains 329605 packs (6845303 blobs) with 1.329 TiB
processed 6845303 blobs: 1003 duplicate blobs, 73.934 MiB duplicate
load all snapshots
find data that is still in use for 3582 snapshots
[5:18] 100.00% 3582 / 3582 snapshots
found 4242858 of 6845303 data blobs still in use, removing 2602445 blobs
will remove 0 invalid files
will delete 102935 packs and rewrite 42845 packs, this frees 496.906 GiB
Load(<data/7444022e50>, 0, 0) returned error, retrying after 625.880497ms: We encountered an internal error. Please retry the operation again later.
hash does not match id: want 7444022e50cbd9aecfe528dddf6576e30a2076f7967a32faec83bb345df7e42d, got b8dfb78be6887f0a541bf0c70de1b929001e486914d5273d9bc2fa23d7ed4f80
github.com/restic/restic/internal/repository.Repack
internal/repository/repack.go:34
main.pruneRepository
.
. (same lines as above)
.
runtime.goexit
/usr/local/Cellar/go/1.12.9/libexec/src/runtime/asm_amd64.s:1337

I thought the local cache (in ~/UserName/Library/Caches/restic) was corrupted so I deleted it.

$ restic rebuild-index
repository 2c128c6e opened successfully, password is correct
counting files in repo
[2:58:41] 100.00% 331991 / 331976 packs
finding old index files
saved new indexes as [e9f1a78f 1eabffc1 0a6d5359 39772f04 6d51b16f a46afd67 dfbd6a93 e72ff1a3 85b7aa38 0c1ff4c6 bf1de3f5 c9316b0d 842448a9 13bc747f 4b4684e5 372b5333 1c563b21 a4e385b4 748806fc 04c18181 03d2dd90 431a71a5 38ce8eba 23f1ba2a 303acd25 3142530f 18dfe80e fc01d988 6cba6e41 d1e5c224 6045ee3c ca00a1bb 75c5ccb9 bec7fa63 97f317be cff77efa 8bd90d98 cde1aed7 645ac19d da1491d7 4d05e008 9bc06db3 345f56c6 317ba248 af863404 3c1542c6 ececd22c 720df2d9 fdb1134f a3350027 ceccb56d fded5f7a 2077af21 d5efafb9 0e942eaa 3b04fd9a 92298b4f 6fd182b7 5b2dd732 0daf3427 160929f8 3f0afd3c 56a9152e d001ba0d ee675a82 e2045af1 e740a261 47379d15 791431de b0cc8963 030c3c14 ec7f34a6 0137b2f5 40c5cfe9 5fafdd71 d9edba84 993b38b9 93cfb9ee b8871b64 0bb0902b a062d52e e8930e9f 8cc61fa2 d89c176c e0393cf4 9fd3a4e2 367509d0 ce568e1f 18823894 66bd6784 dd31556f e11391ec a1ff625b 207307a9 a07c85b7 953c96c7 ae6c1f25 91e681d2 376dd001 40de33ae 74c9aaf3 b1544907 def6e887 6e1cc586 171deb95 6a0a0e18 0f562ae6 a1ae5947 2a4f5af6 fa8693a6 7a765b35]
remove 179 old index files

$ restic prune
repository 2c128c6e opened successfully, password is correct
counting files in repo
building new index for repo
Load(<data/4010bf21f2>, 591, 7431270) returned error, retrying after 381.190953ms: La clé spécifiée n’existe pas.
[7:08:03] 100.00% 332018 / 332018 packs
repository contains 332018 packs (6989389 blobs) with 1.339 TiB
processed 6989389 blobs: 140048 duplicate blobs, 9.771 GiB duplicate
load all snapshots
find data that is still in use for 3603 snapshots
tree a3443afbd562f367968d66439ba7143d4c4072d3295b4ab04bac0005f9a2b226 not found in repository
github.com/restic/restic/internal/repository.(*Repository).LoadTree
.
. (same lines as above)
.
runtime.goexit
/usr/local/Cellar/go/1.12.9/libexec/src/runtime/asm_amd64.s:1337

Last attempt was

$ restic prune
repository 2c128c6e opened successfully, password is correct
counting files in repo
building new index for repo
Load(<data/60dfdd0d1e>, 591, 4226993) returned error, retrying after 585.763275ms: We encountered an internal error. Please retry the operation again later.
Load(<data/60e6b4a076>, 591, 4342779) returned error, retrying after 497.340218ms: We encountered an internal error. Please retry the operation again later.
Load(<data/610a1f7dcc>, 591, 4717291) returned error, retrying after 525.307881ms: We encountered an internal error. Please retry the operation again later.
[3:27:40] 70.42% 233999 / 332278 packs
unexpected EOF

I’m stuck, I’m ending with 1.3 Tb at Wasabi of which a part is no use as my forget was successful. My aim was to slim my repo.

The “internal error” and “connection refused” errors likely are caused by issues internal to Wasabi and this isn’t something restic can be expected to fix; however, it does retry following a failure, as you can see in the logs. As long as the attempted operation eventually succeeds, everything is fine.

As far as the missing tree, I’m not sure why this tree went missing but you can search for snapshots that reference this tree. Forgetting the snapshots will fix the error, but of course that also means you’re throwing away some backups. Whether you want to forget them is up to you.

restic find --tree a3443afbd562f367968d66439ba7143d4c4072d3295b4ab04bac0005f9a2b226

Thanks for the quick reply.

I was wondering if all these troubles were not coming from the fact that other hosts are doing their backups while the prune is done.

Was it not a bad idea to unlock the locked repo ?

As the other hosts are laptops and user are not aware a backup is running, the user can close the laptop lid during bakcup. Would that affect operations from restic ?

Yes, that absolutely can cause this kind of problem and is exactly why pruning requires an exclusive lock. Backup hosts deduplicate against the index, which is not updated until the end of the prune operation. This tree object probably wasn’t referenced by any snapshots that existed when you started pruning, so the prune operation discarded it. However, a backup client apparently tried to add it and found it already in the index so deduplicated it.

The end result is that prune tossed an object that a new snapshot references.

Yeah, that’s a bad idea. :confused:

You may be able to fix the damage by re-running the backups. If one of them still wants that tree to exist (e.g. that directory has not changed), it will re-add it now that the index no longer contains it. That will effectively repair the repository by reintroducing the missing tree object.

It might, depending on what restic does when the system comes out of sleep – if it resumes the backup then it could result in damage if you’ve pruned the repository after it started the interrupted backup and before it finished it.

If restic aborts the backup then no harm would be done.

@fd0 If there is a way for restic to detect that the system was suspended in some way (admittedly this may not be possible in all scenarios) then the backup should probably only be resumed after checking that all of the same index files that existed at the start of the backup still exist in the repository. Otherwise we can conclude that the repository was pruned and we should either abort or restart the backup.

1 Like

Hm, good point. I thought about this and I think there’s a way: we could have a thread continuously check the time every few seconds, and abort the backup if the time jumps too much (e.g. by more than 5 minutes). This should reliably detect suspended systems and processes, but it would require work to actually build this check.

I think we should abort a backup if a suspend is detected because other instances of restic rely on all lock files being refreshed regularly, so if locks become stale (because the process is suspended and cannot refresh them) they may be removed or ignored by the other instances.

It’d be great if somebody could file a feature request issue so we can track this, thanks! :slight_smile:

Shouldn’t my prune prohibit any other backup to run ?

If it’s a bad idea to be able to unlock shouldn’t the end user unable to unlock an exclusive lock ? Or at least a warning explaining what could happen and a confirmation ?

About the suspend feature request, my english and understanding of restic are too bad to do it but I’m glad my questions lead to it.

Pruning acquires an exclusive lock on the repository, yes. This requires that no other lock exists.

It sounds like you forcibly unlocked the repository. This bypasses the safety of the locking mechanism and assumes that you know what you are doing could damage the repository.

Possibly. I think it’s general knowledge that removing locks owned by other processes in any kind of locking system can result in damage if that other process is still running.

unlock is designed to remove locks that aren’t in use anymore. Using it to remove in-use locks can lead to repository damage.

I thought restic unlock removes only stale locks and should always be save to use?

Yes you are correct:

$ restic unlock --help

The “unlock” command removes stale locks that have been created by other restic processes.

Usage:
restic unlock [flags]

Flags:
-h, --help help for unlock
–remove-all remove all locks, even non-stale ones

OK, so I launched a prune when no lock were involved.

$ restic -v -v prune
repository 2c128c6e opened successfully, password is correct
counting files in repo
building new index for repo
[6:12:45] 100.00% 332642 / 332642 packs
repository contains 332642 packs (7001245 blobs) with 1.341 TiB
processed 7001245 blobs: 140279 duplicate blobs, 9.789 GiB duplicate
load all snapshots
find data that is still in use for 3708 snapshots
[5:39] 100.00% 3708 / 3708 snapshots
found 4258306 of 7001245 data blobs still in use, removing 2742939 blobs
will remove 0 invalid files
will delete 102903 packs and rewrite 45140 packs, this frees 506.637 GiB
[12:07:36] 19.52% 8812 / 45140 packs rewritten

It is still running after 12 hours (I have a 1Gb/500Mb connection).

Seems to me it will take 20 hours to finish.
I was wondering why rewrting packs instead of only deleting the ones that are no more in use ?

This depends whether it’s run with --remove-all. And in some cases, I’ve found that it appears to remove locks for still-running processes.

Packs contain multiple objects. If some of the objects in a pack are still in use but some are not, the only way to remove the objects that aren’t in use is to download the objects that are in use, put them in a new pack, then remove the old pack.

I just did a simple unlock, no --remove-all. So this shouldn’t have removed the exclusive locks and I should have been locked out of my prune, no ?

just to let you know

$ restic -v -v prune
repository 2c128c6e opened successfully, password is correct
counting files in repo
building new index for repo
[6:12:45] 100.00% 332642 / 332642 packs
repository contains 332642 packs (7001245 blobs) with 1.341 TiB
processed 7001245 blobs: 140279 duplicate blobs, 9.789 GiB duplicate
load all snapshots
find data that is still in use for 3708 snapshots
[5:39] 100.00% 3708 / 3708 snapshots
found 4258306 of 7001245 data blobs still in use, removing 2742939 blobs
will remove 0 invalid files
will delete 102903 packs and rewrite 45140 packs, this frees 506.637 GiB
Save(<data/8c3736e9b3>) returned error, retrying after 680.137568ms: client.PutObject: We encountered an internal error. Please retry the operation again later.
Save(<data/d8c875a8fa>) returned error, retrying after 326.295316ms: client.PutObject: We encountered an internal error. Please retry the operation again later.
Save(<data/e79f605d68>) returned error, retrying after 348.997931ms: client.PutObject: We encountered an internal error. Please retry the operation again later.
[41:52:41] 80.27% 36234 / 45140 packs rewritten

As you can see it last…
more than 6 hours to rebuild index,
more than 5 hours to find data that are still in use, and it will be (if everything is running fine) about 50 hours to rewrite packs,
total about 60 hours to prune.
Do that mean that each time I’ll ask for a prune it will last 60 hours ?

There’s an open discussion that mentions the possibility that restic unlock could remove non-stale but long-lived locks due to a bug. The issue is unconfirmed; maybe you hit this issue?

The time to prune is roughly linear with the number of objects in the repository, so maybe. Note that this prune removed about a third of the objects in the repository, so if you re-ran prune immediately I’d expect the time to be closer to 40 hours.

After more than 40 hours…

.
.
[5:39] 100.00% 3708 / 3708 snapshots
found 4258306 of 7001245 data blobs still in use, removing 2742939 blobs
will remove 0 invalid files
will delete 102903 packs and rewrite 45140 packs, this frees 506.637 GiB
Save(<data/8c3736e9b3>) returned error, retrying after 680.137568ms: client.PutObject: We encountered an internal error. Please retry the operation again later.
Save(<data/d8c875a8fa>) returned error, retrying after 326.295316ms: client.PutObject: We encountered an internal error. Please retry the operation again later.
Save(<data/e79f605d68>) returned error, retrying after 348.997931ms: client.PutObject: We encountered an internal error. Please retry the operation again later.
hash does not match id: want 7c69b2611593839d7bf19fa3a4ed5daf38a09cfa4ff30f49a1a6dc96e31d73cf, got ef1732f92088ae139fbae476536f994bcf495d32c6e64470e7e90dcb6efd1632
github.com/restic/restic/internal/repository.Repack
internal/repository/repack.go:34
main.pruneRepository
cmd/restic/cmd_prune.go:278
main.runPrune
cmd/restic/cmd_prune.go:85
main.glob…func18
cmd/restic/cmd_prune.go:25
github.com/spf13/cobra.(*Command).execute
vendor/github.com/spf13/cobra/command.go:762
github.com/spf13/cobra.(*Command).ExecuteC
vendor/github.com/spf13/cobra/command.go:852
github.com/spf13/cobra.(*Command).Execute
vendor/github.com/spf13/cobra/command.go:800
main.main
cmd/restic/main.go:86
runtime.main
/usr/local/Cellar/go/1.12.9/libexec/src/runtime/proc.go:200
runtime.goexit
/usr/local/Cellar/go/1.12.9/libexec/src/runtime/asm_amd64.s:1337

What do that mean ?
Is my repo completlly dea ? Or prune can be done partially ?
What’s next to repair ?
Thanks for your help

The error message doesn’t indicate exactly what is damaged (pack, blob, or tree) but it does indicate that the hash of whatever that data is doesn’t match what was expected. This means that some data in the repository is corrupt.

@fd0 Should this message show what kind of object was being hashed so that find could be used to figure out where the bad data is being used?

@fd0 As, I assume, it’s no use retrying without removing the corrupted data, I think that would be a good to have a clue on what to remove, no ?