Id ... not found in repository

lohth3Tu · March 19, 2023, 8:31am

Hello,

I have a main repository and secondary repositorys which are clones of the main repository (to have offsite backups). The main repository should be intact as I ran the following commands on it:

restic prune --repack-small
restic rebuild-index --read-all-packs
restic check --read-data

On the secondary repository (to which I am copying) I ran the following commands:

restic copy --cleanup-cache
restic forget --keep-within ....
restic prune --repack-small
restic check --read-data

The command restic prune --repack-small fails:

$ restic prune --repack-small -r restic-repository                                                                                                                
enter password for repository: 
repository 725f6694 opened (version 2, compression level auto)
loading indexes...
loading all snapshots...
finding data that is still in use for 470 snapshots
[0:00] 0.21%  1 / 470 snapshots
id 2303d91f0a9066353a9cfb136ad6553f9336e4a70ff83acb61264ccdb81aeea6 not found in repository
github.com/restic/restic/internal/repository.(*Repository).LoadBlob
	github.com/restic/restic/internal/repository/repository.go:274
github.com/restic/restic/internal/restic.LoadTree
	github.com/restic/restic/internal/restic/tree.go:113
github.com/restic/restic/internal/restic.loadTreeWorker
	github.com/restic/restic/internal/restic/tree_stream.go:36
github.com/restic/restic/internal/restic.StreamTrees.func1
	github.com/restic/restic/internal/restic/tree_stream.go:176
golang.org/x/sync/errgroup.(*Group).Go.func1
	golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75
runtime.goexit
	runtime/asm_amd64.s:1598

restic check yields a long list of errors. Some of the form pack 06e31fb40331c7a1628ab01e7d20fdc07497fd12528743d2df182d6ab70755f7: not referenced in any index and pack cd0807bf54cf5e25d9c462b8613eb454714249c8556a8047398c5fd099df8103 contained in several indexes: {4608a282 877f98cd} (which are labelled as non-critial) and fatal ones:

error for tree 0beefbc7:
  tree 0beefbc7e481ae55f34faf4edcc8dd3543c2489f230b81e49ca048d535b1be46: file "R 77288.jpg" blob 71e3106f10d0f860140f1ed316bf739f0c6dd0a98d320c7e87d1fb44a3fe11d9 not found in index
  tree 0beefbc7e481ae55f34faf4edcc8dd3543c2489f230b81e49ca048d535b1be46: file "R 77289.jpg" blob 78a89b9c43c26728172d442263ab28e2d7efbc31e8c3705a09431f93e73c9092 not found in index
  tree 0beefbc7e481ae55f34faf4edcc8dd3543c2489f230b81e49ca048d535b1be46: file "R 77290.jpg" blob ab00fb5e890dd1b4834504299bfc6d9e6637df4bac50fd49b10ad0bcb84a3e99 not found in index
  tree 0beefbc7e481ae55f34faf4edcc8dd3543c2489f230b81e49ca048d535b1be46: file "R 77291.jpg" blob 2371e2ec775c92a5d9e9b1c57e4037ddb52f627df48457b57e7a805646a4688e not found in index
  tree 0beefbc7e481ae55f34faf4edcc8dd3543c2489f230b81e49ca048d535b1be46: file "R 77300.jpg" blob 48939f799b4731ff01ed498f52b54b49ad96b208d63c13e0cdc67a1ebc56ecf5 not found in index
  tree 0beefbc7e481ae55f34faf4edcc8dd3543c2489f230b81e49ca048d535b1be46: file "R 77301.jpg" blob 1672ff118fb6f2761c24b28408ee53aa3053b64d572b047327747815eb5dc55b not found in index
  tree 0beefbc7e481ae55f34faf4edcc8dd3543c2489f230b81e49ca048d535b1be46: file "R 77302.jpg" blob 93427453ddfe99a26aa232a714c86e86ee5f6a94e00c0133f03f19b1ecf88777 not found in index
  tree 0beefbc7e481ae55f34faf4edcc8dd3543c2489f230b81e49ca048d535b1be46: file "R 77303.jpg" blob aade03912457de2cc0ba6c45828f80983a88d5a18c9d83910676cee82821ed7b not found in index
  tree 0beefbc7e481ae55f34faf4edcc8dd3543c2489f230b81e49ca048d535b1be46: file "R 77304.jpg" blob 2e4802e93017497d7a4f0fcd41e42785986bc41bce8d99b178fd4e8af09f0a83 not found in index
  tree 0beefbc7e481ae55f34faf4edcc8dd3543c2489f230b81e49ca048d535b1be46: file "R 77305.jpg" blob d7bd6ebcc6451524fc1771a27932ed6533c281352fb409b0595ea311acb25a8c not found in index
  tree 0beefbc7e481ae55f34faf4edcc8dd3543c2489f230b81e49ca048d535b1be46: file "R 77306.jpg" blob 078b6ce8a240d7ce5fee35f06f3bfe81aa6c703a44ec9dbda9e3852250182416 not found in index
  tree 0beefbc7e481ae55f34faf4edcc8dd3543c2489f230b81e49ca048d535b1be46: file "R 77307.jpg" blob 39123b1ae17e8fc52a87a33e7b93db34104bdcf6e7f7efa6a5cb62b949a6c299 not found in index
  tree 0beefbc7e481ae55f34faf4edcc8dd3543c2489f230b81e49ca048d535b1be46: file "R 77308.jpg" blob bbc8ffb40bea1c5a6fd4cda64e726923cd6cb57f6d90bc31ed394d64c9462a3c not found in index
  tree 0beefbc7e481ae55f34faf4edcc8dd3543c2489f230b81e49ca048d535b1be46: file "R 77309.jpg" blob 7ec71c2506b30aaa4f08e8fa5524f263745467159c5b86e6e512171eaf1c178f not found in index
  tree 0beefbc7e481ae55f34faf4edcc8dd3543c2489f230b81e49ca048d535b1be46: file "R 77317.jpg" blob 47a76cae9777b5d070c482e97bfcd20bf46d1a44431d24b20dc17469719e252f not found in index

How do I fix this?

Cheers

MichaelEischer · March 21, 2023, 9:32pm

Which restic version are you using? And where are the repositories stored? Was the repository intact before the prune failure?

The first step is to use restic rebuild-index for the secondary repo; this ensures that restic is accurately able to tell which data actually exists in the repository. The repair steps afterwards depend a bit on what restic check reports afterwards.

lohth3Tu · April 2, 2023, 6:38am

In the meantime I removed the secondary repository and created a new one. Yesterday I finished copying all snapshots to it (from the main repository). restic check did not report any errors.

Today I again copied new snapshots from the main repo to the secondary and got the same error after executing:

restic copy --cleanup-cache
restic forget --keep-within 90d --keep-within-hourly 120d --keep-within-daily 150d --keep-within-weekly 1000y
restic prune --repack-small
restic check

Was the repository intact before the prune failure?

I can only tell that restic copy did not report any errors yesterday. But I can insert a restic check before restic prune --repack-small in my script but it will take at least one week for me to be able to answer your question.

EDIT: I am currently using restic 0.15.1 compiled with go1.20 on linux/amd64 on Archlinux

lohth3Tu · April 2, 2023, 10:00am

I adapted my script which copies the main repository. The adapted script copies one snapshot after the other and then performs the other operations (check, forget, prune). I did this, because I had the impression that the error is caused by a particular snapshot, because as said before yesterday restic check did not report any errors, but after copying only a few snapshots todays the copy failed. The important part is for-loop (the other lines are about retrieving passwords and URIs from KeePassXC.

#!/bin/zsh

safeFile="...."
mainEntry="...."

echo -n "Password for KeePassXC: "
read -s keepassPassword
echo ""

# Query for entries
# Note: Echoing the password is unsafe, because the password is leaked (i.e. visible in top).
IFS=$'\n' read -r -d '' -A entries <<< `echo "${keepassPassword}" | keepassxc-cli search --quiet "${safeFile}" "restic-offsite"`

if [[ -z "${entries}" ]] then
    echo "Wrong password or no entry with a tag 'restic-offsite'."
    exit -1
fi

# Display entires
echo ""
counter=1
for entry in $entries
do
    echo "${counter}: ${entry}"
    counter=$((counter + 1))
done

# Ask to choose one entry
echo ""
echo -n "Which entry?"
read -s entryIndex
echo ""

entry=$entries[${entryIndex}]

IFS=$'\n' read -rd '' SECONDARY_PASSWORD SECONDARY_REPOSITORY <<< `echo "${keepassPassword}" |
    keepassxc-cli show --quiet -sa password -a RESTIC_REPOSITORY "${safeFile}" "${entry}"`
IFS=$'\n' read -rd '' MAIN_PASSWORD MAIN_REPOSITORY <<< `echo "${keepassPassword}" |
    keepassxc-cli show --quiet -sa password -a RESTIC_REPOSITORY "${safeFile}" "${mainEntry}"`

keepassPassword=""

export RESTIC_REPOSITORY="${MAIN_REPOSITORY}"
export RESTIC_PASSWORD="${MAIN_PASSWORD}"

set -e
for snapshot in `restic snapshots --compact | cut -d" " -f1 | grep -x '.\{8,8\}'`
do
    export RESTIC_REPOSITORY="${SECONDARY_REPOSITORY}"
    export RESTIC_PASSWORD="${SECONDARY_PASSWORD}"
    restic check
    export RESTIC_FROM_REPOSITORY="${MAIN_REPOSITORY}"
    export RESTIC_FROM_PASSWORD="${MAIN_PASSWORD}"
    restic copy --cleanup-cache ${snapshot}
    export RESTIC_FROM_REPOSITORY=""
    export RESTIC_FROM_PASSWORD=""

    echo "execute 'restic check' before 'restic forget'"
    restic check
    restic forget --keep-within 90d --keep-within-hourly 120d --keep-within-daily 150d --keep-within-weekly 1000y
    echo "execute 'restic check' before 'restic prune'"
    restic check
    restic prune --repack-small
    restic check
done

@MichaelEischer : Do you want me to change anything in the script? Should I compile restic myself with debug symbols?

MichaelEischer · April 7, 2023, 5:13pm

check performs a superset of the sanity checks in prune. Thus, if prune fails, then check should just do the same.

Regarding your script. The text output of restic snapshots is not guaranteed to be stable, that guarantee only exists for the json output.

It would also be much faster to only run copy within the loop and move everything else outside.

Which backend are you using to store your repository?

lohth3Tu · April 8, 2023, 8:03am

It would also be much faster to only run copy within the loop and move everything else outside.

Well, then I can just get rid of the loop completely. I added the loop to find out if it is a particular snapshot which causes problems. The script I use in production just copies all snapshots.

Which backend are you using to store your repository?

I use the local backend.

MichaelEischer · April 8, 2023, 8:01pm

copy verifies snapshots while copying them to ensure that it doesn’t copy broken snapshots.

I forgot to mention it, but that error is somewhat unusual. With your restic version is can only occur if prune or rebuild-index crashes or is killed while rewriting the index. Could you check whether restic was maybe killed by the OOM-Killer?

But even that case shouldn’t be able to cause any missing data in the repository…

lohth3Tu · April 9, 2023, 6:04am

With your restic version is can only occur if prune or rebuild-index crashes or is killed while rewriting the index. Could you check whether restic was maybe killed by the OOM-Killer?

I’ll run the script without forget --prune to check it the error occurs again. Hopefully it does not

lohth3Tu · April 16, 2023, 5:31pm

@MichaelEischer : A few days ago I created a fresh copy of the secondary repository. restic check and even restic check --read-data completed successfully. Today I called my script again to copy the newest snapshots to the secondary repository. The relevant part from the script looks like this:

restic copy --cleanup-cache

export RESTIC_FROM_REPOSITORY=""
export RESTIC_FROM_PASSWORD=""

restic forget --keep-within 90d --keep-within-hourly 120d --keep-within-daily 300d --keep-within-weekly 1000y
#restic prune --repack-small

if [ "${check}" = true ] ; then
    if [ "${fullCheck}" = true ] ; then
        restic check --read-data
    else
        restic check
    fi
fi

As you can see I didn’t use restic rebuild-index or restic prune or restic forget --prune.

After a few snapshots I decided to plug the external drive to in another USB port. So I pressed ctrl+c and cancelled the script:

snapshot fd3515a6 of [/volume1] at 2023-04-14 15:30:03.182466489 +0200 CEST)
  copy started, this may take a while...
  signal interrupt received, cleaning up
could not load snapshots: context canceled
[0:23] 26.90%  39 / 145 packs copied

I then started my copy script again and all snapshots were copied, but the restic check failed (after restic forget) again with Fatal: repository contains errors.

I copied the output of both runs of the script (once abort via ctrl+c and once failed) to a log file. Here some stats about the log:

$ ls -lh restic-error.log                                          
-rw-r--r-- 1 me me 720M 16. Apr 19:19 restic-error.log
$ cat restic-error.log | wc -l   
4250060
$ grep "error for tree " restic-error.log | wc -l   
315644
$ grep "pack .*: not referenced in any index" restic-error.log | wc -l
76068
$ grep "contained in several indexes" restic-error.log | wc -l   
0

Can you help me figure out what is going wrong?

MichaelEischer · April 17, 2023, 7:42pm

That looks a lot like the filesystem doesn’t remember which files it stored. Which filesystem do you use?

Did you cleanly unmount the filesystem?

lohth3Tu · April 18, 2023, 2:48pm

Did you cleanly unmount the filesystem?

I did not intentionally unmount it unclean. At most unintentionally.

That looks a lot like the filesystem doesn’t remember which files it stored. Which filesystem do you use?

On this external dive (for better compatibility with non Linux systems) I used NTFS.

The hard drive itself seems to be healthy. The index folder is empty.

Should I give it another go with ext4 or btrfs?

MichaelEischer · April 19, 2023, 5:49pm

If I remember correctly, the userspace NTFS driver on Linux does not honor fsync (requests that a file is written to disk). My guess would be that the repository corruption is related to that. You should definitely run fsck / chkdsk on that partition.

That sort of proves that the filesystem just forgot about lots of files. It is impossible to accidentally cause restic to remove all index files (well, there are a few options, but these are labeled accordingly).

ext4/btrfs would definitely be a better option to keep your data safe.

lohth3Tu · May 3, 2023, 3:56pm

@MichaelEischer I tested my copy script a few times with a new btrfs instead of ntfs and until now no further issues. So: Don’t use NTFS