Avoiding/detecting errors at source

TRPB · December 2, 2019, 4:38pm

Let’s say I’m backing up my family photos. I’m not going to look at them often and they’re unlikely to change frequently (occasionally I might update the metadata to make them easier to mange) but I do want to keep them long term.

When I take new photos I put them in the photos directory on my computer, the photos directory is automatically periodically backed up using restic.

So far so good, I have the backup.

My question is, what if there are bad sectors on the HDD on my computer and some of the photos become corrupted?

Presumably restic sees the file has changed and takes a snapshot of it.

This isn’t an issue because restic still has the snapshot of the original version.

However, if I’m keeping, for example, a year’s worth of snapshots, if I don’t notice the corruption within a year, the corrupted version of the photo is now the only version in the backup!

How can I avoid this without keeping infinite snapshots? Can restic detect/warn when this happens?

So my question are are:

Is my hypothesis here correct? Will restic copy the corrupted file into the backup (Presumably yes? unless it only looks at mtimes to decide what’s changed?)
Is there any way of avoiding this?
Can restic detect this or do I need some further automation to look for this kind of problem?

My thinking is:

Read the mtime of the file in the snapshot
Read the mtime of the file on the disk
Checksum the file in the snapshot
Checksum the file on the disk
If the mtimes are the same but the checksum is different display a warning that something might have become corrupted

Is there any way to get restic to do this? Does it do it already?

cdhowie · December 2, 2019, 4:55pm

When it has a parent snapshot, restic uses a combination of multiple pieces of metadata to detect if the file should be re-hashed. If none of that metadata has changed, restic skips the file.

My suspicion is that restic would not rehash the file in this specific case since none of the metadata would have changed. The corrupt data would therefore not be added to the repository.

TRPB · December 2, 2019, 5:45pm

I’m not sure, I just did a quick test:

mkdir restictest
cd restictest
mkdir backup
mkdir source
# create a file with random contents
cat /dev/urandom  | head -c 120000 > ./source/original
# Set specific atime/mtime so it can be set to this again after modification
touch -d '2 Dec 2019 15:00:00.00' ./source/original
restic init --repo ./backup
restic -r ./backup backup ./source
# update the file contents
cat /dev/urandom  | head -c 120000 > ./source/original
# reset the modification/access time so the file looks the same
touch -d '2 Dec 2019 15:00:00.00' ./source/original
# create the second snapshot
restic -r ./backup backup ./source

output:

repository df6c50bc opened successfully, password is correct

Files:           0 new,     1 changed,     0 unmodified
Dirs:            0 new,     0 changed,     0 unmodified
Added to the repo: 117.539 KiB

processed 1 files, 117.188 KiB in 0:00
snapshot 2d569e4e saved

So it is storing the updated file.

I realize that literally every byte in the file is being changed here but the updated version is being stored in the backup despite both files being the same size and mtime.

It would be useful to know exactly what restic does to calculate whether a file should be updated or not.

cdhowie · December 2, 2019, 5:52pm

My test does not show the same result after dumping random data into the original file and resetting the mtime:

$ restic -r repo/ backup dir
enter password for repository: 
repository fe47b2e5 opened successfully, password is correct

Files:           0 new,     0 changed,     1 unmodified
Dirs:            0 new,     0 changed,     0 unmodified
Added to the repo: 0 B  

processed 1 files, 117.188 KiB in 0:00
snapshot de7ca469 saved

There is something else happening here. Can you run the test again, running ls -li ./source/original after setting the mtime, and showing the complete transcript of the test? Can you also confirm the operating system / distribution as well as the filesystem holding the source data, and the mount options for that volume?

TRPB · December 2, 2019, 6:01pm

Here’s my complete output. I’ve also included stat for the file at prior to creating each snapshot

[tom@desktop restictest]$ mkdir backup
[tom@desktop restictest]$ mkdir source
[tom@desktop restictest]$ cat /dev/urandom  | head -c 120000 > ./source/original
[tom@desktop restictest]$ stat ./source/original
  File: ./source/original
  Size: 120000          Blocks: 240        IO Block: 4096   regular file
Device: 10303h/66307d   Inode: 15205786    Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/     tom)   Gid: ( 1000/     tom)
Access: 2019-12-02 17:56:38.130355920 +0000
Modify: 2019-12-02 17:56:38.133689261 +0000
Change: 2019-12-02 17:56:38.133689261 +0000
 Birth: 2019-12-02 17:56:38.130355920 +0000
[tom@desktop restictest]$ ls -li ./source
total 120
15205786 -rw-r--r-- 1 tom tom 120000 Dec  2 17:56 original
[tom@desktop restictest]$ touch -d '2 Dec 2019 15:00:00.00' ./source/original
[tom@desktop restictest]$ stat ./source/original
  File: ./source/original
  Size: 120000          Blocks: 240        IO Block: 4096   regular file
Device: 10303h/66307d   Inode: 15205786    Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/     tom)   Gid: ( 1000/     tom)
Access: 2019-12-02 15:00:00.000000000 +0000
Modify: 2019-12-02 15:00:00.000000000 +0000
Change: 2019-12-02 17:56:58.430403204 +0000
 Birth: 2019-12-02 17:56:38.130355920 +0000
[tom@desktop restictest]$ ls -li ./source
total 120
15205786 -rw-r--r-- 1 tom tom 120000 Dec  2 15:00 original
[tom@desktop restictest]$ restic init --repo ./backup
enter password for new repository: 
enter password again: 
created restic repository e567738b81 at ./backup

Please note that knowledge of your password is required to access
the repository. Losing your password means that your data is
irrecoverably lost.
[tom@desktop restictest]$ restic -r ./backup backup ./source
enter password for repository: 
repository e567738b opened successfully, password is correct
created new cache in /home/tom/.cache/restic

Files:           1 new,     0 changed,     0 unmodified
Dirs:            0 new,     0 changed,     0 unmodified
Added to the repo: 117.536 KiB

processed 1 files, 117.188 KiB in 0:00
snapshot 28fed53a saved
[tom@desktop restictest]$ cat /dev/urandom  | head -c 120000 > ./source/original
[tom@desktop restictest]$ touch -d '2 Dec 2019 15:00:00.00' ./source/original
[tom@desktop restictest]$ stat ./source/original
  File: ./source/original
  Size: 120000          Blocks: 240        IO Block: 4096   regular file
Device: 10303h/66307d   Inode: 15205786    Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/     tom)   Gid: ( 1000/     tom)
Access: 2019-12-02 15:00:00.000000000 +0000
Modify: 2019-12-02 15:00:00.000000000 +0000
Change: 2019-12-02 17:57:55.710539318 +0000
 Birth: 2019-12-02 17:56:38.130355920 +0000
[tom@desktop restictest]$ ls -li ./source
total 120
15205786 -rw-r--r-- 1 tom tom 120000 Dec  2 15:00 original
[tom@desktop restictest]$ restic -r ./backup backup ./source
enter password for repository: 
repository e567738b opened successfully, password is correct

Files:           0 new,     1 changed,     0 unmodified
Dirs:            0 new,     0 changed,     0 unmodified
Added to the repo: 117.536 KiB

processed 1 files, 117.188 KiB in 0:00
snapshot 5520bbdd saved

I do notice in stat there is a Change entry. It doesn’t look like touch can alter this. Perhaps that’s the cause?

The filesystem is ext4 on Arch Linux. fstab entry:

#/dev/nvme0n1p3
UUID=fc6ad741-d52d-47eb-b6a6-0026f27b29f3       /               ext4            rw,relatime     0 1

I just created this in my home directory which is on the / partition ( I don’t have a separate partition for /home)

edit: probably also worth including:

$ restic version
restic 0.9.6 compiled with go1.13.4 on linux/amd64

cdhowie · December 2, 2019, 8:19pm

Aha, that’s the difference. I’m on 0.9.5.

The behavior is different because of a bugfix in 0.9.6 around Excel resetting mtime and therefore restic not noticing that a file has changed. Ctime is checked in 0.9.6 but was not in 0.9.5.

$ git diff v0.9.5 v0.9.6 -- internal/archiver/archiver.go
diff --git a/internal/archiver/archiver.go b/internal/archiver/archiver.go
index b21f79e8..16dd7625 100644
--- a/internal/archiver/archiver.go
+++ b/internal/archiver/archiver.go
@@ -453,8 +460,13 @@ func fileChanged(fi os.FileInfo, node *restic.Node, ignoreInode bool) bool {
                return true
        }
 
-       // check size
+       // check status change timestamp
        extFI := fs.ExtendedStat(fi)
+       if !ignoreInode && !extFI.ChangeTime.Equal(node.ChangeTime) {
+               return true
+       }
+
+       // check size
        if uint64(fi.Size()) != node.Size || uint64(extFI.Size) != node.Size {
                return true
        }

However, keep in mind that corruption of the file contents will not see the mtime nor ctime changed – unless that’s what was corrupted, and in that case the file’s inode is probably damaged and you’ll get errors from the filesystem driver in the kernel log as well as I/O errors returned to restic.

cdhowie · December 2, 2019, 8:37pm

One possible solution would be to have the system mail you a diff of the new snapshot to the prior one after each backup. This is the basic command you’d use (modify the snapshots invocation with --host, --path, and/or --tag as required):

restic diff $(restic snapshots --json | jq -r '.[-2:] | map(.id)[]')

Every backup, look for unexpected modifications (lines starting with M).

TRPB · December 3, 2019, 5:23pm

Thank you. Yes, this is the behaviour I had hoped would happen as if the contents are corrupted then they are not copied to the backup. As you say, metadata corruption should be a lot more obvious.

I just wanted clarification that this was the case and a better understanding of how restic works behind the scenes.

This is a really nice idea just for a bit of extra peace of mind, thanks!