Recovering from full storage

bytesnz · March 30, 2023, 9:32am

Hello,

So I have been running restic with rclone since 2020. My cloud storage just ran out of space and when I investigated, it was one of my restic backups.

Looking at the snapshots of the backup, I found that it had 108 snapshots instead of the ~23 I was expecting (I run backups every day followed by restic -r rclone:backup_place --password-file secrets forget --keep-daily 7 --keep-weekly 4 --keep-monthly 12 --prune) (a problem, but not on my priority list atm).

List of snapshot

# restic -r rclone:backup_place --password-file secrets snapshots -c
repository 937d3623 opened (version 1)
ID        Time                 Host                Tags
---------------------------------------------------------
5064a50e  2020-02-17 06:58:29  host
277ae795  2020-02-20 13:42:42  host
8051fb6b  2020-02-22 02:00:02  host
f35e6b21  2020-02-29 02:00:02  host
571c37d2  2020-03-31 02:00:01  host
77aa384f  2020-04-30 02:00:01  host
f4b490e1  2020-05-03 02:00:01  host
d139a08f  2020-05-10 02:00:02  host
b02e2ef0  2020-05-17 02:00:01  host
baf82a16  2020-05-18 02:00:01  host
ba26f522  2020-05-19 02:00:01  host
c9cc8131  2020-05-20 02:00:01  host
0c399f09  2020-05-21 02:00:01  host
c7e91c82  2020-05-22 02:00:01  host
3d740f37  2020-05-23 02:00:01  host
2b2546fe  2020-05-23 13:46:08  host
a812e0dc  2020-05-23 19:41:01  host
b0bf4a46  2020-05-24 11:01:26  host
b1940e13  2020-05-31 02:00:01  host
a6b1563f  2020-06-02 02:00:01  host
46da7773  2020-06-03 02:00:01  host
c37da64d  2020-06-04 02:00:01  host
e2f4209f  2020-06-05 02:00:01  host
a9389097  2020-06-06 02:00:01  host
bda01d2d  2020-06-07 02:00:01  host
1a3942b4  2020-06-08 02:00:01  host
dc28cf73  2020-06-08 19:15:36  host-1
1441f60c  2020-07-26 02:00:01  host-1
b40e1427  2020-07-31 02:00:01  host-1
b3251552  2020-08-02 02:00:02  host-1
417d8ec0  2020-08-09 02:00:01  host-1
a347ee21  2020-08-10 02:00:02  host-1
58dc2a89  2020-08-11 02:00:01  host-1
aa8db2bf  2020-08-12 02:00:01  host-1
f9e7f1a8  2020-08-13 02:00:01  host-1
d5d3c1e1  2020-08-14 02:00:01  host-1
03845ee6  2020-08-15 02:00:01  host-1
3f9ec694  2020-08-16 02:00:01  host-1
b0ad66b0  2020-08-31 02:00:01  host-1
5a742655  2020-09-27 02:00:01  host-1
f6d527aa  2020-09-30 02:00:01  host-1
52e24d7f  2020-10-04 02:00:01  host-1
bd08b444  2020-10-08 02:00:02  host-1
bd8ad871  2020-10-09 02:00:01  host-1
3b82dadf  2020-10-10 02:00:01  host-1
c4db262e  2020-10-11 02:00:02  host-1
26dc5f01  2020-10-12 02:00:02  host-1
94e282b5  2020-10-13 02:00:02  host-1
0c5055c4  2020-10-14 02:00:01  host-1
14ba5b42  2020-10-15 02:00:01  host-2
8c990e44  2020-10-16 02:00:01  host-2
f24ce9a0  2020-10-18 02:00:02  host-2
d7655464  2020-10-22 02:00:02  host-2
bfa1a511  2020-10-23 02:00:01  host-2
2b6e979f  2020-10-24 02:00:01  host-2
8ace9f58  2020-10-25 02:00:01  host-2
7a5f5ce8  2020-10-26 02:00:02  host-2
42138638  2020-10-27 02:00:01  host-2
03afd310  2020-10-28 02:00:02  host-2
4002ad05  2020-10-31 02:00:01  host-2
d77868af  2020-11-30 02:00:01  host-2
a83fcfe5  2020-12-31 02:00:01  host-2
05125346  2021-01-03 02:00:01  host-2
97395050  2021-01-10 02:00:01  host-2
d787513d  2021-01-17 02:00:01  host-2
8e22cf19  2021-01-18 02:00:01  host-2
5024af70  2021-01-19 02:00:01  host-2
165adea1  2021-01-20 02:00:01  host-2
4cd08e8f  2021-01-21 02:00:01  host-2
4d24107b  2021-01-22 02:00:01  host-2
9d5c42d4  2021-01-23 02:00:02  host-2
a60cb254  2021-01-24 02:00:01  host-2
88291c51  2021-01-25 02:00:02  host-2
3bcf7f87  2021-01-26 02:00:01  host-2
b16a7a6c  2021-01-27 02:00:01  host-2
4fd7a3d5  2021-01-28 02:00:01  host-2
c7f555cb  2021-01-29 02:00:02  host-2
e9ed4952  2021-01-30 02:00:01  host-2
435928ef  2021-01-31 02:00:02  host-2
190afbcd  2021-02-07 02:00:02  host-2
4d4b1f8b  2021-02-08 02:00:01  host-2
0b6fc5e6  2021-02-09 02:00:01  host-2
bd37c95c  2021-02-10 02:00:01  host-2
1dbfba8b  2021-02-11 02:00:01  host-2
63a4eab6  2021-02-12 02:00:01  host-2
f57622d9  2021-02-13 02:00:01  host-2
26f4be4e  2022-04-12 02:00:01  host-2
3196c702  2022-05-31 02:00:01  host-2
dc59340d  2022-06-30 02:00:02  host-2
a72064e4  2022-07-31 02:00:02  host-2
c20740df  2022-08-31 02:00:02  host-2
913485de  2022-09-30 02:00:01  host-2
46189ae3  2022-10-31 02:00:01  host-2
29bb5576  2022-11-29 02:00:01  host-2
868e0491  2022-12-31 02:00:01  host-2
d8945885  2023-01-31 02:00:01  host-2
db1348b4  2023-02-28 02:00:02  host-2
62fbe7c5  2023-03-12 02:00:01  host-2
c58366d2  2023-03-19 02:00:01  host-2
b0bfc913  2023-03-21 02:00:01  host-2
ec48177e  2023-03-22 02:00:01  host-2
0c96aecd  2023-03-23 02:00:02  host-2
191898f7  2023-03-24 02:00:02  host-2
b4e9360f  2023-03-25 02:00:01  host-2
1890f22d  2023-03-26 02:00:01  host-2
5adc0b88  2023-03-27 02:00:01  host-2
---------------------------------------------------------
106 snapshots

After clearing some space on the cloud storage, I was able to get restic manually forget the excess snapshots. However, when I run restic -r rclone:backup_place --password-file secrets prune, it now errors:

prune error

# restic -r rclone:backup_place --password-file secrets prune
repository 937d3623 opened (version 1)
loading indexes...
loading all snapshots...
finding data that is still in use for 20 snapshots
Load(, 391, 18805) returned error, retrying after 720.254544ms: EOF
Load(, 391, 18805) returned error, retrying after 873.42004ms: EOF
Load(, 391, 18805) returned error, retrying after 1.054928461s: EOF
Load(, 391, 18805) returned error, retrying after 1.560325776s: EOF
Load(, 391, 18805) returned error, retrying after 3.004145903s: EOF
Load(, 391, 18805) returned error, retrying after 2.147653057s: EOF
Load(, 391, 18805) returned error, retrying after 3.739082318s: EOF
Load(, 391, 18805) returned error, retrying after 5.099891944s: EOF
Load(, 391, 18805) returned error, retrying after 10.263247495s: EOF
Load(, 391, 18805) returned error, retrying after 19.514091959s: EOF
[1:14] 100.00%  20 / 20 snapshots
EOF
ReadFull()
github.com/restic/restic/internal/backend.ReadAt
  /restic/internal/backend/readerat.go:39
github.com/restic/restic/internal/repository.(*Repository).LoadBlob
  /restic/internal/repository/repository.go:298
github.com/restic/restic/internal/restic.LoadTree
  /restic/internal/restic/tree.go:113
github.com/restic/restic/internal/restic.loadTreeWorker
  /restic/internal/restic/tree_stream.go:36
github.com/restic/restic/internal/restic.StreamTrees.func1
  /restic/internal/restic/tree_stream.go:176
golang.org/x/sync/errgroup.(*Group).Go.func1
  /home/build/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75
runtime.goexit
  /usr/local/go/src/runtime/asm_amd64.s:1594

Running restic -r rclone:backup_place --password-file secrets check results in lots of non-referenced packs, some incorrectly sized packs and the same error when it starts checking the snapshots.

end of check output

704 additional files were found in the repo, which likely contain duplicate data.
This is non-critical, you can run `restic prune` to correct this.
check snapshots, trees and blobs
Load(, 391, 18805) returned error, retrying after 720.254544ms: EOF
Load(, 391, 18805) returned error, retrying after 873.42004ms: EOF
Load(, 391, 18805) returned error, retrying after 1.054928461s: EOF
Load(, 391, 18805) returned error, retrying after 1.560325776s: EOF
Load(, 391, 18805) returned error, retrying after 3.004145903s: EOF
Load(, 391, 18805) returned error, retrying after 2.147653057s: EOF
Load(, 391, 18805) returned error, retrying after 3.739082318s: EOF
Load(, 391, 18805) returned error, retrying after 5.099891944s: EOF
Load(, 391, 18805) returned error, retrying after 10.263247495s: EOF
Load(, 391, 18805) returned error, retrying after 19.514091959s: EOF
error for tree fad85238:
[3:11] 100.00%  20 / 20 snapshots
  ReadFull(): EOF
Fatal: repository contains errors

Any ideas on how to fix this and recover the backups?

Thanks.

akrabu · March 30, 2023, 4:32pm

What version of restic are you running? I am fuzzy on the details but I seem to remember something like this with an older version.

Also if you’re completely out of space… that can make this process difficult. If there’s any non-Restic data you can remove to give Restic some wiggle room, that would be best. If not, there’s prune --max-repack-size 0 which may fix it. If not, there’s also the prune --unsafe-recover-no-free-space switch but, it’s really a last resort.

If you can free up a few gigabytes of (non-restic) space first, then I would download the latest version, and run rebuild-index then prune --max-repack-size 0 then do a check - if everything is okay, continue pruning however you’d like. If it’s not okay, you might try the Repair PR (or repair from Rustic, a Restic spin-off). But if you CAN’T free up space first… that’s a whole other ballgame. You might try prune --unsafe-recover-no-free-space at that point, but if it were me, I’d probably just download the whole repo, fix locally, then Rclone back up (by using the --delete-before switch and/or just randomly deleting some cloud data). Had to do that once. Wasn’t fun, took nearly a month (8TiB repo), but I did in the end fix everything with the above steps.

If it comes to downloading the whole thing, and your cloud provider supports it… you miiiight be able to “roll back” to a good state, and download THAT instead. Then clean it up, and reupload.

Ps. One thing I like to do on fixed-space cloud storage is to upload a 25GB file just as a space holder, so I can delete it in emergencies like this.

bytesnz · April 5, 2023, 8:11am

I am running 0.15.1. Thanks. Tried getting rid of the latest snapshots (now down to 19 snapshots, well be fore I ran out of space), but no joy. Will give it the rebuild-index a try. Just waiting for it to fully download. Have downloaded most of the files, but am still getting the error about the single file with check. Fingers crossed. Will let you know how I get on

akrabu · April 5, 2023, 4:33pm

Make sure you actually have room to write the new index, else that’s not worth your time (and will likely make things worse). If it’s completely full and there’s no way for you to free up (or buy more) space, about your only option is to download the whole thing, repair it on a larger drive, then reupload it.

Well, you could try the --unsafe-recover-no-free-space… but I’d want to do the above first if I could. Consider it your last resort.

bytesnz · April 6, 2023, 8:01am

I ran rebuild-index, but not looking any better. Now getting the following errors

...
error for tree 91a99699:
  tree 91a996990035f0e3ec492d629b2386cfb975497504b3552aa8e8b0f6736224f2: file "xxxxx.db" blob 075971dc8642c3a29cbb0078636ba28e28ead599545f1326d44ee10911d19c7e not found in index
  tree 91a996990035f0e3ec492d629b2386cfb975497504b3552aa8e8b0f6736224f2: file "xxxxx.db" blob 62152fd571eccf984a67886088ee9192a59bce368e32fce2db8b04b833e2eefe not found in index
  tree 91a996990035f0e3ec492d629b2386cfb975497504b3552aa8e8b0f6736224f2: file "xxxxx.db" blob 14e139fe2e4b15a69aae149fa7b1198cdc8f165ef563a227c24a4937a719f2e7 not found in index
error for tree fad85238:
  id fad85238dde6eaa4cf45fd930d5a2fa52616f89ccd9b69de4d7d82969d30783b not found in repository
error for tree 84b33428:
  tree 84b334286b106ffb00232054d39d8dbc73563c66ca68ed887510241a978d42df: file "xxxxx.log" blob 21c035222df82da102301fdc6e8870dfd481a86571d118f15ffb6bf1d1343bed not found in index
  tree 84b334286b106ffb00232054d39d8dbc73563c66ca68ed887510241a978d42df: file "xxxxx.db" blob 1de647167b8f448faf4e8de0da6edc0d1246013c4575c23c5cc7f50cec11e348 not found in index
  tree 84b334286b106ffb00232054d39d8dbc73563c66ca68ed887510241a978d42df: file "xxxxx.db" blob 066028c0e6c9af5d6998c937932879bd14c573e33c77b6de5e9d1b7e122e2e33 not found in index
  tree 84b334286b106ffb00232054d39d8dbc73563c66ca68ed887510241a978d42df: file "xxxxx.db" blob c2e12791d368fc78c8ae53e71cba8463bb8e8bddc05d728521097f84d4e0fca8 not found in index
  tree 84b334286b106ffb00232054d39d8dbc73563c66ca68ed887510241a978d42df: file "xxxxx.db" blob d45a08079d2ec798d2554761cca530f9a849317d8a6875fc210ac195313d264c not found in index

The 2 files mentioned in the errors I’m not bothered with missing, but it doesn’t seem to be able to look past the id fad85238dde6eaa4cf45fd930d5a2fa52616f89ccd9b69de4d7d82969d30783b not found in repository. When I run prune it errors out with


repository 937d3623 opened (version 1)
loading indexes...
loading all snapshots...
finding data that is still in use for 16 snapshots
[0:00] 0.00%  0 / 16 snapshots
id fad85238dde6eaa4cf45fd930d5a2fa52616f89ccd9b69de4d7d82969d30783b not found in repository
github.com/restic/restic/internal/repository.(*Repository).LoadBlob
  /restic/internal/repository/repository.go:274
github.com/restic/restic/internal/restic.LoadTree
  /restic/internal/restic/tree.go:113
github.com/restic/restic/internal/restic.loadTreeWorker
  /restic/internal/restic/tree_stream.go:36
github.com/restic/restic/internal/restic.StreamTrees.func1
  /restic/internal/restic/tree_stream.go:176
golang.org/x/sync/errgroup.(*Group).Go.func1
  /home/build/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75
runtime.goexit
  /usr/local/go/src/runtime/asm_amd64.s:1594

bytesnz · April 7, 2023, 7:34am

Sigh. Have another backup that had the same wasn’t forgetting snapshots. Upgraded to the latest and ran my forget command. It removed most of the extra snapshots, but still had 31 extra snapshots. However it complained about something not quite being right and told me to run the rebuild-index command.
I then tried manually removing the remaining excess snapshots and got this

found 1 old cache directories in /root/.cache/restic, run `restic cache --cleanup` to remove them
[0:02] 100.00%  32 / 32 files deleted
32 snapshots have been removed, running prune
loading indexes...
loading all snapshots...
finding data that is still in use for 22 snapshots
[0:01] 100.00%  22 / 22 snapshots
searching used packs...
{<data/07dd2a5b> <data/2fb245a6> <data/46beffb2>} not found in the index

Integrity check failed: Data seems to be missing.
Will not start prune to prevent (additional) data loss!
Please report this error (along with the output of the 'prune' run) at
https://github.com/restic/restic/issues/new/choose
Fatal: index is not complete

And now check returns

/usr/bin/restic -r rclone:store:share --password-file supersecret check
using temporary cache in /tmp/restic-check-cache-487549782
repository 8442ac8a opened (version 1)
created new cache in /tmp/restic-check-cache-487549782
create exclusive lock for repository
load indexes
check all packs
pack aab0dda246f75f9621329f09ac264b16dd88e8e5ac5a7cbaf5d9acf4b696e54a: not referenced in any index
1 additional files were found in the repo, which likely contain duplicate data.
This is non-critical, you can run `restic prune` to correct this.
check snapshots, trees and blobs
Load(<data/c277bbcf27>, 0, 0) returned error, retrying after 720.254544ms: <data/c277bbcf27> does not exist
Load(<data/c277bbcf27>, 0, 0) operation successful after 1 retries
Load(<data/31f2ab2c54>, 0, 0) returned error, retrying after 582.280027ms: <data/31f2ab2c54> does not exist
Load(<data/31f2ab2c54>, 0, 0) returned error, retrying after 703.28564ms: <data/31f2ab2c54> does not exist
Load(<data/31f2ab2c54>, 0, 0) returned error, retrying after 1.040217184s: <data/31f2ab2c54> does not exist
Load(<data/31f2ab2c54>, 0, 0) returned error, retrying after 2.002763936s: <data/31f2ab2c54> does not exist
Load(<data/31f2ab2c54>, 0, 0) returned error, retrying after 1.431768704s: <data/31f2ab2c54> does not exist
...
Load(<data/31f2ab2c54>, 393, 0) operation successful after 10 retries
[2:22] 100.00%  22 / 22 snapshots
Fatal: repository contains errors

So another busted repo. Going to restart both of backups.

akrabu · April 19, 2023, 8:17pm

This time you might consider placing a 10-25GB dummy file out there, just to guard against this happening again. Good luck!

ProactiveServices · April 20, 2023, 4:18am

The fact you seem to have had a pack fail to load the first time seems odd, even if you’ve hit storage limits. I’d be contacting the provider to ask them about this.

FYI, the reason your forget runs are not removing as many snapshots as you’re expecting is because forget groups by hosts and paths by default. Since you have snapshots for several different hosts, it’s running your policy for each host. You can use --group-by paths to ignore hosts when grouping.

bytesnz · April 27, 2023, 9:16am

Something that didn’t occur to my at the time, I did notice transfer errors during some backups — I thought they were initial errors and a reattempt succeeded. From memory, they were 500s from the storage provider (for completeness pCloud via rclone).

Looking at another backup, I have the same issue, but just with one missing data file and one snapshot.

The snapshot is easily removed with ... forget <snapshot_id>.

For the data file, check spits out the partial sha of the tree nodes that have the issue with the file.

error for tree f4f79f6f:
  ReadFull(<data/486e3c483e>): <data/486e3c483e> does not exist

I was able to find the snapshot using ... find --tree f4f79f6f which returned the full has and the path of the tree.

I was hoping it would be a simple case of cating the tree blob to find the problem file, but it turned up nothing — the tree blob didn’t mention the bad file hash and cating all the files in the tree returned no errors.

I also tried finding the error file hash, but neither --blob nor --pack returned anything.

Am I missing something?

My plan is to find the file that is associated with the bad data file and delete it from the snapshots.

akrabu · April 28, 2023, 4:04pm

Ohhhhh… fellow pCloud user here. I have a lifetime account.

I have NOT had good experiences using Restic with pCloud (via Rclone). Search for “pCloud” on this forum and you’ll see. pCloud doesn’t handle partially uploaded files correctly. If you upload something and it gets interrupted to, say, B2, for instance - it will be stored as a partial but not be present in the filesystem. pCloud just stores the partial file as if it were the full file. So when Restic tries to “resume” from a failed backup, it sees that file is there and says, oh we have that chunk! Let’s skip it. Except… it’s only a partial chunk. And now you have a broken backup.

I brought this up with pCloud support but… they were unwilling to do anything about it. We aren’t their use case. So… be very careful trusting backups on pCloud with Restic (and other tools).

ON TOP OF THAT… sometimes files will just appear on the website, yet not in the local client. Or vice-versa. There’s a lot of bugs with pCloud. It is not surprising to me at all that you have missing packs.

I like pCloud, I still use pCloud - but not with Restic or any other sort of “database” style program. It’s good for large archives that you can verify the SHA-1 and then just let it sit. Things that change often… break often. Especially prune jobs, in our case.

MichaelEischer · April 30, 2023, 12:04pm

It’s actually worse. Restic is very strict regarding upload problems: failed uploads are retried until they succeed or if it doesn’t work after 10 retries, then the whole backup operation is aborted. Thus, to end up in a broken state, the storage backend has to confirm that the upload was successful, but afterwards lose the uploaded data anyways…