During a backup run Restic reported an error about “tree is not known”. Restic then created a snapshot that apparently did not use any deduplication with previous data, i.e. it added the whole data again to the repository.
I’m wondering why this happened, and how I can prevent this in the future. If this happens over a weekend, I might run out of space on the backup storage before I notice the problem on Monday .
This was the backup command:
docker run \
--rm \
--volume "$SCRIPT_LOCATION:/var/lib/restic_config_volume:ro" \
--volume "$BACKUP_REPOSITORY_PATH:/var/lib/restic" \
--volume my-volume:/var/lib/postgresql/data:ro \
-e RESTIC_REPOSITORY=/var/lib/restic \
-e RESTIC_PASSWORD_FILE=/var/lib/restic_config_volume/restic_backup_password \
-e RESTIC_PROGRESS_FPS=0.0016666666666666668 \
mazzolino/restic:1.7.1 \
backup \
--no-cache \
--host "${HOSTNAME}" \
--one-file-system \
--exclude-caches \
--compression=max \
--tag "machine_config_id_${MACHINE_CONFIG_ID}" \
/var/lib/postgresql/data/
This was the error message:
Feb 06 02:34:28 RBC-SRV script_executor.sh[3855649]: using parent snapshot 6bb05132
Feb 06 02:34:28 RBC-SRV script_executor.sh[3855649]: error: tree 188e5321a07f3a0fba5c8f2e5ec915c1ab86753ac2bac3c84e1b72e1922de75b is not known; the repository could be damaged, run `repair index` to try to repair it
Feb 06 02:44:27 RBC-SRV script_executor.sh[3855649]: [10:00] 4.27% 691 files 47.929 GiB, total 2211 files 1.097 TiB, 1 errors ETA 3:56:02
and this was the summary after backup was finished:
Feb 06 06:03:30 RBC-SRV script_executor.sh[3855649]: Files: 2211 new, 0 changed, 0 unmodified
Feb 06 06:03:30 RBC-SRV script_executor.sh[3855649]: Dirs: 30 new, 0 changed, 0 unmodified
Feb 06 06:03:30 RBC-SRV script_executor.sh[3855649]: Added to the repository: 1.097 TiB (831.236 GiB stored)
Feb 06 06:03:30 RBC-SRV script_executor.sh[3855649]:
Feb 06 06:03:30 RBC-SRV script_executor.sh[3855649]: processed 2211 files, 1.097 TiB in 3:29:03
Feb 06 06:03:30 RBC-SRV script_executor.sh[3855649]: snapshot cdb8f495 saved
Feb 06 06:03:30 RBC-SRV script_executor.sh[3855649]: Warning: at least one source file could not be read
I then ran restic check
:
# restic --no-lock --no-cache check
repository 7a39ad76 opened (version 2, compression level auto)
load indexes
[0:04] 100.00% 49 / 49 index files loaded
check all packs
pack 7aed4355fedaf782bd3fb055403a5a12b6b5cb346faba4e948bce6f7e50cafcf: not referenced in any index
[...]
744 additional files were found in the repo, which likely contain duplicate data.
This is non-critical, you can run `restic prune` to correct this.
check snapshots, trees and blobs
[0:37] 100.00% 9 / 9 snapshots
no errors were found
Based on the suggestions from Tree could not be loaded - #2 by injerto I searched for the tree object:
09858093d70b:/# restic --no-lock --no-cache find --show-pack-id --tree 188e5321a07f3a0fba5c8f2e5ec915c1ab86753ac2bac3c84e1b72e1922de75b
repository 7a39ad76 opened (version 2, compression level auto)
[0:03] 100.00% 54 / 54 index files loaded
Object belongs to pack 9899022a53e02206dd8cb0aa6dece433056a38b354e4de5ddbd07f9336643464
... Pack 9899022a: <Blob (tree) 188e5321, offset 19286, length 261, uncompressed length 330>
09858093d70b:/# restic --no-lock --no-cache cat pack 9899022a53e02206dd8cb0aa6dece433056a38b354e4de5ddbd07f9336643464 | wc
76 423 20198
But I don’t know what this tells me.
The backups since then have worked fine. I did not run prune
or repair index
yet; that would be my next steps.
This happened with Restic 0.16.2 (restic 0.16.2 compiled with go1.21.3 on linux/amd64
) from the “mazzolino/restic” Docker image, version 1.7.1, under Ubuntu 20.04 .
The Restic repository is located on a Samba share. I’m not aware of problems with this particular share, but I could imagine that there might be temporary problems with network drives?
There should be no cache involved (I use --no-cache
when doing the backup).
So, my questions are:
- is there anything I can do to prevent this problem in the future?
- do you want to debug this problem further, in case it’s an unknown bug?
- do you have suggestions how to automatically repair this? I suppose if I automatically run
prune
after doing the backup, it would throw away the duplicate data (i.e. there would be no space wasted, only CPU time)?
Btw. thanks a lot for creating Restic! I use it at home and at work, for several TBs of data by now, and it’s so nice that it just solves the backup problem.