Corrupted AWS repository: file "does not exist"

Hello,

I’ve recently discovered an issue with my repository. I had a local repository (now gone) and its copy in AWS (made using rclone sync).

I wanted to rebuild the local repo using local data and the missing data from AWS, but I was told it was practically impossible, so I really need my AWS repo… but it doesn’t work. If I try to read data I always get errors: using listing commands works (snapshots, ls in restic mounted dir, etc…) but as soon as I try to read a file I got I/O errors. A simple restic check is enough to report errors with the data.

I’ve checked in AWS and I’m able to manually download the files that restic cannot get.

For example, restic check complains about

  • Load(<data/7338d69102>, 0, 0) failed: <data/7338d69102> does not exist
    but I can manually download the file 7338d69102a396e4244f819f084f083f097239871d8f4178f25606a350af8c36 from AWS and its contents looks good:
#cat 7338d69102a396e4244f819f084f083f097239871d8f4178f25606a350af8c36 | sha256sum 
7338d69102a396e4244f819f084f083f097239871d8f4178f25606a350af8c36  -

If you need help with a problem, please always include (in your post below this comment section):

• The output of restic version.

restic 0.17.2 compiled with go1.23.2 on linux/amd64

• The complete commands that you ran (leading up to the problem or to reproduce the problem).

a restic check is enough to reproduce the error

• Any environment variables relevant to those commands (including their values, of course).

I’ve set only RESTIC_REPOSITORY, RESTIC_PASSWORD_FILE, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, RESTIC_CACHE_DIR.

• The complete output of those commands (except any repeated output when obvious it’s not needed for debugging).

# restic check
using temporary cache in /home/restic/.cache/restic-check-cache-1192674016
create exclusive lock for repository
repository 559ce728 opened (version 2, compression level auto)
created new cache in /home/restic/.cache/restic-check-cache-1192674016
load indexes
[0:32] 100.00% 951 / 951 index files loaded
check all packs
check snapshots, trees and blobs
Load(<data/7338d69102>, 0, 0) failed: <data/7338d69102> does not exist
Load(<data/48c1bba5ca>, 0, 0) failed: <data/48c1bba5ca> does not exist
Load(<data/341771415b>, 0, 0) failed: <data/341771415b> does not exist
Load(<data/a07011f9c0>, 0, 0) failed: <data/a07011f9c0> does not exist
Load(<data/66708d5db7>, 0, 0) failed: <data/66708d5db7> does not exist
Load(<data/7338d69102>, 0, 0) failed: <data/7338d69102> does not exist
error for tree 936ff20f:
ReadFull(<data/7338d69102>): <data/7338d69102> does not exist
Load(<data/48c1bba5ca>, 0, 0) failed: <data/48c1bba5ca> does not exist
error for tree 9f063e45:
ReadFull(<data/48c1bba5ca>): <data/48c1bba5ca> does not exist
Load(<data/341771415b>, 0, 0) failed: <data/341771415b> does not exist
error for tree 7898accf:
ReadFull(<data/341771415b>): <data/341771415b> does not exist
Load(<data/0ea432fa93>, 0, 0) failed: <data/0ea432fa93> does not exist
Load(<data/a07011f9c0>, 0, 0) failed: <data/a07011f9c0> does not exist
error for tree 12fd3257:
ReadFull(<data/a07011f9c0>): <data/a07011f9c0> does not exist
Load(<data/3be4e9761a>, 0, 0) failed: <data/3be4e9761a> does not exist
Load(<data/66708d5db7>, 0, 0) failed: <data/66708d5db7> does not exist
error for tree 563e0a2f:
ReadFull(<data/66708d5db7>): <data/66708d5db7> does not exist
Load(<data/b8cb9f8057>, 0, 0) failed: <data/b8cb9f8057> does not exist
Load(<data/5a360b4b1e>, 0, 0) failed: <data/5a360b4b1e> does not exist
Load(<data/b1da15d04b>, 0, 0) failed: <data/b1da15d04b> does not exist
Load(<data/4399a3b67f>, 0, 0) failed: <data/4399a3b67f> does not exist
Load(<data/0ea432fa93>, 0, 0) failed: <data/0ea432fa93> does not exist
— output truncated —
[1:35] 100.00% 240 / 240 snapshots

I’m more than willing to perform tests and manual procedures in order to recover this repository, but I’m stuck at the “does not exist”, since the file is there and it is available.

Please also note the file 7338d69102a396e4244f819f084f083f097239871d8f4178f25606a350af8c36 is in the data directory, not in data/73/ subdirectory.

Thanks,
radel

I just realized restic now hates the files in the data “folder”: even if in the error is looking for <data/7338d69102> it now really looks for <data/73/7338d69102>.

I thought this was backward compatible, but I guess you cannot live in the past forever. :smile:

Which restic version did you create this repository with? :eyes:

Oh, I do not remember, but I think 0.3 or something like that.

This repo has seen quite a few changes which had some side effects, but I never lost a single bit: every time I was able to regain access to my data. Restic is SUPER!

1 Like

See restic/changelog/0.17.0_2024-07-26/issue-4602 at master · restic/restic · GitHub . The legacy s3 layout was deprecated in restic 0.17.0. The migration steps probably work, although you might want to try with a small repository first.

The deprecation in restic 0.17.0 should have resulted in the following error message:

detected legacy S3 layout. Use `RESTIC_FEATURES=deprecate-s3-legacy-layout=false restic migrate s3_layout` to migrate your repository

What I don’t understand is why it didn’t work in your case. The code changes for the deprecation don’t change the logic, but only return an error if the legacy layout was detected. So basically the repository would also have been unreadable by restic 0.16.x.

Does the repository contain a key or a keys directory? Did you maybe mix data files from different repository layouts?

I have to admit I didn’t read the changelog before upgrading: I started to look into the changes after restic started complaining. I remember doing a rollback, some “mv”, and at the end I discovered the migration procedure, so it could be I messed up fiddling with the directories.

I’m not sure, but I suppose I manually renamed the snapshot and key folder, after that I tried the repo migration and I think the data directory format wasn’t migrated because I already migrated the key directory. Is it possible? After that I didn’t had to read from the data packs and I didn’t noticed the issue until the first restic mount…

Due to some issues I have no longer access to that repo’s copy ( I had to reconstruct it as explained here), but looking into the logs I’ve found a rclone sync after the repo upgrade: I have a bunch of snapshots in the snapshot folder deleted and the same in the snapshots folder uploaded as new. The same for the key file, moved from the key folder to the keys folder.

Nothing changed in the data directory that day, so I just started to have new files in the data/xx folders, while the old files remained in the data folder. I thought it was ok, but I never had the chance to properly test it: less tnat two months after the repo migration I had a drive failure and I discovered the issue while trying some restores.

2024/10/29 23:46:23 INFO : keys/fb9991e76ecce086d23b070340efec89f1fad8f5dea917d27eb833b9624db0e1: Copied (new)
2024/10/29 23:46:23 INFO : snapshots/00425fb261d0e824b9d15b087e100042c5c5e3e712d614dec73e7dc451ca554a: Copied (new)
2024/10/29 23:46:23 INFO : snapshots/00bc41042060d6d1596dd9a62735b1d2f944fd4f02faa133287a735395d390ce: Copied (new)
2024/10/29 23:46:23 INFO : snapshots/01c8e9522f058f90682e284f0f8fa58b0f8aa2c8dca074e5666035f73a5cca44: Copied (new)
2024/10/29 23:46:23 INFO : snapshots/0216bfac231e1d609663699b62b9277aaa8313d28b937be9159c79cea178d41b: Copied (new)
2024/10/29 23:46:23 INFO : snapshots/0332dc8c0e0b3d33e1807b7927edac76d6f9e1a4ed4e654489d3d8a65ee54df4: Copied (new)
2024/10/29 23:46:23 INFO : snapshots/0409a2cde3010d1ed3faac4953a113be756e7ff2aeff1565331c59dfe57fca76: Copied (new)
– output truncated –
2024/10/29 23:53:05 INFO : snapshot/6d1622504f91b4c90d7c0a333610b962cd0bb6e3134723de916acf2193de4a83: Deleted
2024/10/29 23:53:05 INFO : snapshot/e2f71c9cec6df86eb7d7b534b03c40f291a84d4737e0cba3b17079b9629b0865: Deleted
2024/10/29 23:53:05 INFO : snapshot/86f4e932aab0e7515889149bd4524979c9eb6f09f523e0631e787edcc6c1e62d: Deleted
2024/10/29 23:53:05 INFO : key/fb9991e76ecce086d23b070340efec89f1fad8f5dea917d27eb833b9624db0e1: Deleted
2024/10/29 23:53:05 INFO : snapshot/4f22debffbaaaefc49af8a32bfdadf34fc9d7cdd4976bf12ff07700a46c3bcdc: Deleted
2024/10/29 23:53:05 INFO : snapshot/1d42ce1e9ba39751f07edc3e3f12e355d3bf34e611afaf305d0d34db71d8e359: Deleted
2024/10/29 23:53:05 INFO : snapshot/f69981d8981f78f5c95bed41cbdabe8fc3821c0e5affffcda9f4278391799299: Deleted

Lesson learned:

  1. check the data: I now execute daily a restic check with read-data, just 1/30 of the repo each day.
  2. read before upgrade: You don’t want to discover a deprecation after the upgrade
  3. restic is hard to break: Despite my attempts I was able to rebuild a working repo without losing a single bit
  4. you cannot really have your backup data in AWS: if you really need to recover from a disaster you’ll pay A LOT! I don’t like saying that, but roughly 100$ per Tb just for the data transfer is too much for me.

I also have a suggestion: I would like an error message in restic upgrade before a deprecation, a repo format change, or anything else that you need to address immediately. When ready you could issue a flag or an env variable to let the upgrade proceed.

1 Like

Yes that is the actual problem. The key/keys folder is used to detect which layout variant is in use. The s3legacy layout migration renames the key files after the data files such that your scenario can’t occur. So you effectively broke the repository by renaming files. In this state neither old nor new restic versions would be able to access the data files.

That is the issue here is actually unrelated to the s3legacy layout deprecation, but rather a case of a broken repository.

I guess it’s obvious, but still worth repeating: there are a lot of storage providers out there that only charge a few dollar per TB traffic, have plenty of free egress traffic and/or allow free egress as a multiple of the stored data (e.g. 1 TB stored, 3TB free egress).

The upgrade command currently doesn’t allow for such messages. Although it’s an interesting idea.

Under normal circumstances the S3 legacy layout deprecation works as follows: you upgrade to restic 0.17.x and if restic detects a repository with s3 legacy layout, then restic fails with the error message mentioned in my previous comment. So basically you’re forced to react to the migration, but have a temporary escape hatch via RESTIC_FEATURES=deprecate-s3-legacy-layout=false. For now there are no further repository format deprecations planned.

Taking a look at the “Chg” entries of the release notes or the blog post won’t hurt, but that won’t help with damaged repositories.

1 Like