Possible backup bug? Or bad backend?

First off, I am running restic 0.16.0 compiled with go1.20.6 on darwin/amd64. I have a backup script that runs Restic on two repos. I backup my photos to iDrive e2 and OneDrive. The commands are identical, and run sequentially. iDrive is via the AWS built-in backend, and OneDrive is via the Rclone backend.

I recently discovered a hashing app called cshatag. In a nutshell, it computes the SHA-256 hash of a given file / directory, then embeds the hash AND modified date as an extended attribute. Later, if you want to check for bitrot, you can run it, and if the modified date has changed, it will report it and update the hash, but not report an error. BUT if the hash has changed and the modified date has NOT, it will report an error. I had previously used rhash which embeds the CRC32 in the filename, but I thought this might be more efficient since it also makes note of modification times.

So. I wanted to try it out. But I wanted a backup first, obviously. So I run my script. I then run the utility. I then run ANOTHER backup, for the sake of it. And, for some reason… they were different? I noticed the MiB backed up seemed to differ more than I would expect. So I went to investigate.

Here’s the snapshots from the iDrive e2 backup, which runs first:

7d1b1ae9  2023-08-25 08:50:43  Media  b4_cshatag
cd80c861  2023-08-25 13:01:03  Media

Here’s the snapshots from the OneDrive backup, which runs immediately after:

02325856  2023-08-25 08:55:35  Media  b4_cshatag
e56139a9  2023-08-25 13:57:56  Media

Straightforward enough. I hadn’t really used extended attributes much, so I was curious how Restic would handle it. I ran a diff on both sets. Here’s the output of iDrive e2:

comparing snapshot 7d1b1ae9 to cd80c861:
M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/iPhone 13 Mini/2023-02-14 19-12-27 [C9DA9557].mov
M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/iPhone 13 Mini/2023-02-14 20-04-34 [01872D07].mov
M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/iPhone 13 Mini/2023-02-14 20-23-45 [CA2A724D].mov
M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/iPhone 13 Mini/2023-02-15 13-24-00 [BB5E49AD].avif
M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/iPhone 13 Mini/2023-02-15 16-09-03 [C83747CA].mov
M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/iPhone 13 Mini/2023-02-16 08-23-06 [F9EFC965].webp
M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/Mark’s iPhone 13 Mini/2023-05-05 08-37-15 [B6130405].heic
M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/Mark’s iPhone 13 Mini/2023-05-05 14-46-46 [5155965E].heic
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (SAHS Band)/IMG_3952 [CCD356CA].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0001 [2CC5C403].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0003 [AC7B8288].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0004 [A1A2645F].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0005 [901D19E0].JPG

Files:           0 new,     0 removed,    13 changed
Dirs:            0 new,     0 removed
Others:          0 new,     0 removed
Data Blobs:     85 new,   100 removed
Tree Blobs:   6301 new,  6301 removed
  Added:   303.307 MiB
  Removed: 289.573 MiB

And the output of OneDrive:

comparing snapshot 02325856 to e56139a9:

M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/iPhone 13 Mini/2023-02-14 19-12-27 [C9DA9557].mov
M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/iPhone 13 Mini/2023-02-14 20-04-34 [01872D07].mov
M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/iPhone 13 Mini/2023-02-14 20-23-45 [CA2A724D].mov
M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/iPhone 13 Mini/2023-02-15 13-24-00 [BB5E49AD].avif
M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/iPhone 13 Mini/2023-02-15 16-09-03 [C83747CA].mov
M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/iPhone 13 Mini/2023-02-16 08-23-06 [F9EFC965].webp
M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/iPhone 13 Mini/2023-05-05 08-37-15 [B6130405].heic
M    /Volumes/Media/Photos & Videos/_Incoming/3_Hashed/iPhone 13 Mini/2023-05-05 14-46-46 [5155965E].heic

Files:           0 new,     0 removed,     8 changed
Dirs:            0 new,     0 removed
Others:          0 new,     0 removed
Data Blobs:     74 new,    88 removed
Tree Blobs:   6301 new,  6301 removed
  Added:   287.092 MiB
  Removed: 273.358 MiB

So, for some reason, the following were modified in the FIRST backup to iDrive e2, but NOT modified in the SECOND backup to OneDrive? That made no sense…

M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (SAHS Band)/IMG_3952 [CCD356CA].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0001 [2CC5C403].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0003 [AC7B8288].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0004 [A1A2645F].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0005 [901D19E0].JPG

So, I restored all six images from both snapshots to compare. Very quickly I saw that the FIRST snapshot from iDrive e2 was corrupt. The images didn’t load, the CRC32 didn’t match, nor did the SHA-256 match (obviously). The SECOND snapshot on OneDrive, which occurred immediately after, was perfectly fine. Then I checked the files on my actual SSD. They checked out fine as well. They load correctly, and the two hashes match.

This makes no sense to me. There were no errors reported on either backup. I never even would have noticed if I wasn’t backing up to two repos and noticed the MiB added differed. And the snapshot with the corrupted files… took place BEFORE the snapshot with the OK files.

I am going to run another snapshot on both repositories, adding the --force switch for good measure. I may run yet another one after that, after emptying Restic’s cache, as well, to see what difference that makes. I’ll report back with the diff’s once I’m done.

I’m also immediately going to make some old-fashioned duplicity + par2 backups as well :sweat_smile:

Also, I’ll spare you my selfies… but here’s the IMG_3952 [CCD356CA].JPG photo:

iDrive e2 @ 2023-08-25 13:01:03

OneDrive @ 2023-08-25 13:57:56:

Alright, it finished. Here’s the results…

Backing up to iDrive e2:

repository 24344e18 opened (version 2, compression level max)
no parent snapshot found, will read all files

Files:       119973 new,     0 changed,     0 unmodified
Dirs:         6329 new,     0 changed,     0 unmodified
Added to the repository: 1.665 MiB (502.166 KiB stored)

processed 119973 files, 902.958 GiB in 48:24
snapshot 726cbf07 saved

Backing up to OneDrive:

repository 24344e18 opened (version 2, compression level max)
no parent snapshot found, will read all files

Files:       119973 new,     0 changed,     0 unmodified
Dirs:         6329 new,     0 changed,     0 unmodified
Added to the repository: 1.590 MiB (484.269 KiB stored)

processed 119973 files, 902.958 GiB in 46:17
snapshot 031c1f83 saved

So here’s my iDrive e2 snapshots:

7d1b1ae9  2023-08-25 08:50:43  Media  b4_cshatag
cd80c861  2023-08-25 13:01:03  Media
726cbf07  2023-08-25 19:52:43  Media  bugcheck

And here’s my OneDrive snapshots:

02325856  2023-08-25 08:55:35  Media  b4_cshatag
e56139a9  2023-08-25 13:57:56  Media
031c1f83  2023-08-25 20:41:10  Media  bugcheck

Here’s the iDrive e2 diff:

restic diff cd80c861 726cbf07
repository 24344e18 opened (version 2, compression level max)
comparing snapshot cd80c861 to 726cbf07:

-    /Volumes/Media/Documents/ [E53B6B6B].DS_Store
-    /Volumes/Media/Photos & Videos/2021/2021-07-11 (Gem Show, Lavender Festival, Seaside, McMenamins)/ [8AF00FFB].DS_Store
-    /Volumes/Media/Photos & Videos/2021/2021-08-19 (Truck Wreck)/ [C4DE329D].DS_Store
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (SAHS Band)/IMG_3952 [CCD356CA].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0001 [2CC5C403].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0003 [AC7B8288].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0004 [A1A2645F].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0005 [901D19E0].JPG
-    /Volumes/Media/Photos & Videos/_To Sort/!iCloud Photos/ [D46FD812].DS_Store
-    /Volumes/Media/Recordings/2022-05-13 Leave Me On Read, The Vagilantes @ Kelly's Olympian/ [15754842].DS_Store
-    /Volumes/Media/Recordings/2022-05-14 The Vagilantes, Full Stop, Tall Dark Whimsy @ Dante's/ [6D88006A].DS_Store
M    /Volumes/Media/Ω/backup.command

Files:           0 new,     6 removed,     6 changed
Dirs:            0 new,     0 removed
Others:          0 new,     0 removed
Data Blobs:     13 new,    18 removed
Tree Blobs:     41 new,    41 removed
  Added:   17.880 MiB
  Removed: 17.923 MiB

Here’s the OneDrive diff:

restic diff e56139a9 031c1f83
repository 24344e18 opened (version 2, compression level max)
comparing snapshot e56139a9 to 031c1f83:

-    /Volumes/Media/Documents/ [E53B6B6B].DS_Store
-    /Volumes/Media/Photos & Videos/2021/2021-07-11 (Gem Show, Lavender Festival, Seaside, McMenamins)/ [8AF00FFB].DS_Store
-    /Volumes/Media/Photos & Videos/2021/2021-08-19 (Truck Wreck)/ [C4DE329D].DS_Store
-    /Volumes/Media/Photos & Videos/_To Sort/!iCloud Photos/ [D46FD812].DS_Store
-    /Volumes/Media/Recordings/2022-05-13 Leave Me On Read, The Vagilantes @ Kelly's Olympian/ [15754842].DS_Store
-    /Volumes/Media/Recordings/2022-05-14 The Vagilantes, Full Stop, Tall Dark Whimsy @ Dante's/ [6D88006A].DS_Store
M    /Volumes/Media/Ω/backup.command

Files:           0 new,     6 removed,     1 changed
Dirs:            0 new,     0 removed
Others:          0 new,     0 removed
Data Blobs:      1 new,     7 removed
Tree Blobs:     38 new,    38 removed
  Added:   1.590 MiB
  Removed: 1.633 MiB

I should note that I noticed rhash had erroneously modified a few .DS_Store files, and so I deleted them. That’s the only modification I made in between snapshots, besides changing the script to force a full read. And, as you see… these files were modified, again:

M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (SAHS Band)/IMG_3952 [CCD356CA].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0001 [2CC5C403].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0003 [AC7B8288].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0004 [A1A2645F].JPG
M    /Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0005 [901D19E0].JPG

I then ran rhash against the embedded CRC codes and…

RHASH (iDrive e2) 7d1b1ae9 2023-08-25 08:50:43 Media b4_cshatag
/Users/akrabu/mnt/1/hosts/Media/2023-08-25T08:50:43-07:00/Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (SAHS Band)/IMG_3952 [CCD356CA].JPG OK 
/Users/akrabu/mnt/1/hosts/Media/2023-08-25T08:50:43-07:00/Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0001 [2CC5C403].JPG OK 
/Users/akrabu/mnt/1/hosts/Media/2023-08-25T08:50:43-07:00/Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0003 [AC7B8288].JPG OK 
/Users/akrabu/mnt/1/hosts/Media/2023-08-25T08:50:43-07:00/Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0004 [A1A2645F].JPG OK 
/Users/akrabu/mnt/1/hosts/Media/2023-08-25T08:50:43-07:00/Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0005 [901D19E0].JPG OK 

RHASH (iDrive e2) cd80c861 2023-08-25 13:01:03 Media
/Users/akrabu/mnt/1/hosts/Media/2023-08-25T13:01:03-07:00/Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (SAHS Band)/IMG_3952 [CCD356CA].JPG ERR
/Users/akrabu/mnt/1/hosts/Media/2023-08-25T13:01:03-07:00/Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0001 [2CC5C403].JPG ERR
/Users/akrabu/mnt/1/hosts/Media/2023-08-25T13:01:03-07:00/Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0003 [AC7B8288].JPG ERR
/Users/akrabu/mnt/1/hosts/Media/2023-08-25T13:01:03-07:00/Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0004 [A1A2645F].JPG ERR
/Users/akrabu/mnt/1/hosts/Media/2023-08-25T13:01:03-07:00/Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0005 [901D19E0].JPG ERR

RHASH (iDrive e2) 726cbf07 2023-08-25 19:52:43 Media bugcheck
/Users/akrabu/mnt/1/hosts/Media/2023-08-25T19:52:43-07:00/Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (SAHS Band)/IMG_3952 [CCD356CA].JPG OK 
/Users/akrabu/mnt/1/hosts/Media/2023-08-25T19:52:43-07:00/Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0001 [2CC5C403].JPG OK 
/Users/akrabu/mnt/1/hosts/Media/2023-08-25T19:52:43-07:00/Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0003 [AC7B8288].JPG OK 
/Users/akrabu/mnt/1/hosts/Media/2023-08-25T19:52:43-07:00/Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0004 [A1A2645F].JPG OK 
/Users/akrabu/mnt/1/hosts/Media/2023-08-25T19:52:43-07:00/Volumes/Media/Photos & Videos/_To Sort/!Images/Pictures/2010/2010-08-21 (Selfies)/IMG_0005 [901D19E0].JPG OK 

So… it added them back. Snapshot cd80c861 added corrupt copies, and the the next snapshot put them back how they were prior to the corrupt snapshot.

I also ran another CRC32 and SHA-256 check on the actual SSD - the entire thing, not just these files. Everything checks out fine - including the timestamps. I don’t think it’s my hardware. And honestly? I was hoping it WAS this SSD. That’s much less disturbing to me than Restic silently saving a corrupted snapshot. The only reason I ever even noticed this was because I was saving to two repositories simultaneously, noticed the size discrepancy, and had hashes to check against… :frowning:

Now that I’m thinking about it some more, I probably should have tried a regular snapshot (w/parent) to see if it would have added it back in that scenario, but I used --force to see if anything else had changed because I was feeling paranoid.

I’m going to run a memory tester overnight, to rule that out. Let me know if there’s any other testing I can do to help track this down!

I would run:

restic check --read-data

against iDrive repo.

From your post it is rather clear that some files in your first backup to iDrive are corrupted.

If check runs successfully it would suggest that corruption was on restic/client side. If it fails that it is remote issue where data was corrupted.

1 Like

Yep, that’s fair. Will probably take a few days. I’ll get that started now. :+1:

Memory test checked out, at least.

Full check just succeeded. Not a single error. The “corrupted” snapshot is still there. So yeah, it’s a silent fail :confused:

It is clear then that repository is not corrupted and contains data which was originally saved there.

Now why corrupted content was saved it is different story. But at least we can rule out possibility that iDrive corrupted data at rest.

The repo isn’t corrupted… but it still silently saved invalid data. Those files never changed. And the next snapshot put them back exactly as they were in the snapshot before those two. It makes me wonder how many other snapshots just have randomly malformed files that are then “updated” to their correct versions on the next snapshot. But… what about malformed files in THAT snapshot?

I’ve checked the SMART data and verified the hashes several times now. The files and SSD check out just fine. I ran a RAM test overnight, that checked out fine. So why did Restic silently save corrupted versions of those files??

I’m not saying it’s definitely Restic, but I’m also running out of ideas for alternatives to test. I would much, much rather it be my hardware (or the backend) than Restic, believe me. And I know Restic is the “canary in the coal mine” for hardware issues. But so far, my machine is doing just fine…

There is one more thing to check. Exact nature and structure of corrupted data. As you have both good and corrupted files you could check what exactly is corrupted. Is it one bit flop? maybe some continues amount of data missing? etc. It can also give some clues where corruption happened.

2 Likes

Ahhhh, that’s a great idea!

First off, they’re exactly the same file size, to the byte. So something’s flipped.

They’re all JPGs, so a regular diff won’t be much good. Let me see what I can come up with…

1 Like

Not totally sure but how should restic know if you intendedly changed a file and then changed it back again in the next snapshot - or if it’s being handed a file that has been changed by some faulty hard- or software? “Silently saved” in this context probably means it did show the changed file but you just didn’t notice, correct?

This might actually be useful. I used binvis.io - which is pretty cool, for the record.

https://v.usetapes.com/nP3DFUbHs4

So the corruption is different in each file - it’s not the same corruption happening to each file. That may or may not be helpful, but it’s at least interesting haha

It’s possible, but I ran a repeated hash check on those six files over the course of a few hours, then another repeated hash check over ALL the files for 24 hours… and not once did a file not match it’s hash. I know that’s not foolproof evidence, but something fishy is going on.

No, Restic wouldn’t know the difference between me feeding it data I changed myself, and data being changed by a faulty drive or interface. That’s true.

But it’s odd that all six files “changed back” in the next snapshot - and subsequent hash checks and snapshots have turned up empty-handed.

I can’t reproduce this, and the only answer at this point is to always do simultaneous backups to two or more repos and monitor the output going forward for any discrepancies, doing a diff should I notice any, and then comparing the file contents on files I know shouldn’t have changed.

All I know is, the SSD appears healthy. There’s no reallocated sectors. I’ve ran numerous tests, and it seems perfectly healthy. Maybe I got zapped by some cosmic rays at just the wrong time. I’ve just got a MacBook - not like I have ECC RAM or anything. But I’d be lying if I said I wasn’t a bit spooked…

My bet would be on a corrupted Restic cache file that got cleared out afterward. But that’s just a guess. :man_shrugging:t2:

1 Like

It’s a creepy situation for sure. Maybe a function like “–verify” could be added to restic that compares checksums of backed up file and original immediately after backing it up?

Ah but so many of us backup live systems… things are bound to change. It might be a useful function with the –use-fs-snapshot switch on Windows, at least?

I’m going to have to go through snapshot by snapshot, looking at diffs, and seeing if anything looks out of the ordinary now. Most of my pictures from earlier than 2020 or so shouldn’t have been touched by anything other than Spotlight, Rhash, or Chashtag - which, other than the filename or extended attributes, shouldn’t have actually modified anything. I’m just really worried there’s going to be randomly corrupt data scattershot through all my snapshots and I won’t know it until I need something…

Absolutely, that is not a pleasant thought. I think I could live with a few lines of “WARNING: checksums for .cache/xyz differ” - better than with your situation now.

1 Like

Yeah. I’m pretty unnerved, if I’m being honest. I really hope it’s my hardware, not Restic. That probably sounds weird, but I can replace an SSD more easily than I can replace Restic as my primary backup solution. Also I just… trust Restic, and this community, so much. But it’s going to be hard to convince folks that it’s NOT my hardware… and likely even harder to track down why this happened, and fix it.

So yeah, for me, I’d prefer it be my hardware. Which is why I’m running test after test on it, hoping to see it fail, so I know that, yep, it was just Restic doing what it does best… hardware testing lol

But so far… that is not the case… :frowning:

1 Like

Totally with you here. I trust restic and a few of my customers do too (although they don’t know they do :rofl:). I guess I should do some tests as well.

1 Like

That cannot cause this type of corruption. restore always calculates the sha256 hash of each restored block and only restores it if it is correct. A corrupted cache file would simply fail that check.

Similarly, the backend cannot cause this type of data corruption (otherwise a backend could just arbitrarily modify backups without anyone noticing).

The only explanation is that this data corruption happened during the backup. Essentially the only option is that restic ended up reading corrupt data for some reason (the data corruption must occur between requesting the file content from the OS and hashing it after chunking the file content) or that restic mixed up the contents from different files (I’ve no idea how the latter could happen, in particular it would require some other valid blob ids to mix up with). Data corruption during compression or encryption would result in an error during restore or check.

What’s very strange is also that the corrupt data seems to be clustered (most files are in the same folder). That suggests some kind of common cause, but I don’t have the slightest clue what that could be.

Quick update fwiw: it took more than a week on my old 2 core i5 Debian machine but I rhashed all 72,303 files (1.3 TB, lots of fpv videos) on the live file system and on the latest restic snapshot. Happy to report an empty diff.

4 Likes