Parent content identification bug?

Hi!

there is seemingly a bug in 0.16.4 that it does not identify the content of the previous snapshot as already backed up even they are the same.

Test. There is a tree like:

/tobackup/a/...
/tobackup/b/...
/tobackup/c/...

take a backup like this:

cd /tobackup
restic backup *

fetch the snapshot id and backup the parent directory itself:

cd /
restic backup tobackup --parent snapshot_id

the this run should just stat() the files, but it reads all content again.

You have not provided the actual complete commands that you run and all of their output. Without that, it’s hard to assess the problem. Please edit your post to include the aforementioned information. I think we’d then be able to tell you what’s going on.

It does not allow to edit the post any more, so here is the reproduction explicitly:

$ export RESTIC_REPOSITORY=/dev/shm/testrepo
$ export RESTIC_PASSWORD=123
$ restic init
created restic repository 162b06c346 at /dev/shm/testrepo

Please note that knowledge of your password is required to access
the repository. Losing your password means that your data is
irrecoverably lost.

$ cd /tobackup
$ mkdir {a..z}
$ dd if=/dev/random of=a/random.dat bs=100M count=1
1+0 records in
1+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.229612 s, 457 MB/s
$ for i in {b..z}; do cp a/random.dat $i; done
$ # do the first backup:
$ restic backup *
repository 162b06c3 opened (version 2, compression level auto)
created new cache in /faststore/cache/restic
no parent snapshot found, will read all files


Files:          26 new,     0 changed,     0 unmodified
Dirs:           26 new,     0 changed,     0 unmodified
Added to the repository: 100.121 MiB (100.072 MiB stored)

processed 26 files, 2.539 GiB in 0:03
snapshot 0c35201c saved
$ # it took 3 seconds and read 26*100MB, that's OK. 
$ # Do it again with specifying the previous snapshot as parent:
$ restic backup * --parent 0c35201c
repository 162b06c3 opened (version 2, compression level auto)
using parent snapshot 0c35201c


Files:           0 new,     0 changed,    26 unmodified
Dirs:            0 new,     0 changed,    26 unmodified
Added to the repository: 0 B   (0 B   stored)

processed 26 files, 2.539 GiB in 0:00
snapshot 01e24009 saved
$ # it took 0 seconds, since it did not read any file just stat()-ed them
$ # and found they all the same.
$ # Now do the same, but backing up the parent directory:
$ cd /
$ restic backup tobackup --parent 0c35201c
repository 162b06c3 opened (version 2, compression level auto)
using parent snapshot 0c35201c


Files:          26 new,     0 changed,     0 unmodified
Dirs:           27 new,     0 changed,     0 unmodified
Added to the repository: 373 B (295 B stored)

processed 26 files, 2.539 GiB in 0:03
snapshot 25e90358 saved
$ # it took again 3 secs, since it read and checksummed all files again 
$ # as if they were not present in the parent 0c35201c. Why?...

The file structure within both snapshots is different as you’re using relative paths (just run restic ls on those snapshots). That’s why restic has to rescan those files.

If you run restic backup /tobackup/* and restic backup /tobackup --parent ... then the rescan should be skipped. This difference in behavior between relative and absolute paths is expected.

2 Likes

You are absolutely right.

I was misled by restic snapshots output where always absolute path displayed. Is it intentional? I find it quite confusing…

Yes and no. Without storing an absolute path in the snapshot metadata, restic wouldn’t be able to tell different folder specified using relative paths apart. However, storing the absolute paths is also not ideal. Introduce a label to uniquely identify 'what' is backed up · Issue #4026 · restic/restic · GitHub might improve the situation.

1 Like