Randomly Needs to Rescan All Data

lrrrac · December 17, 2020, 3:58pm

I have been using restic for over a year on 44TB of data and it works great. Except when randomly it needs to rescan all of my data. Usually a scan with no new data takes 5 minutes and it shows no file paths as it does. Ive had twice now had the program decide to rescan through every file. It doesnt upload anything, but it rereads every file. This takes something like 10 days. Is there a reason this happens? Is there something I can do to fix it? I have tried forcing parents going back weeks, but it still needs to read every file. Thanks!

fd0 · December 17, 2020, 6:53pm

Welcome back!

Unfortunately you don’t show us how you run restic, so I have to guess. This could be one of several reasons:

The source path changed, check the command line
The hostname changed, check the hostname printed in restic snapshots
The source file system has changed, this sometimes happens when removable media is mounted, so the device ID changed, check what stat file prints for some file. There’s an issue (or PR) somewhere which improves that.
The inodes for the files change, this sometimes happens with fuse-based file systems. You can try the --ignore-inode flag for restic backup.

It’d be helpful if you can tell us:

What’s your source file system?
How did you call restic exactly?

lrrrac · December 17, 2020, 7:07pm

Sorry for the lack of information. Here is the command I run:

/./home/lrrrac/restic/restic/restic -p /home/lrrrac/scripts/RESTIC_PASSWORD_FILE -r rclone:Drive:HomeServer backup /media --tag media --verbose --exclude "*/Documents/Drives/snapraid*" --exclude "*/lost+found/**" --exclude "*/Downloads/incomplete/**"

The backend is google drive (I know its hated and it sucks and everything, but it works great for me), the source file system is EXT4 on internal HDDs mounted at startup through an HBA card. The path, hostname, source files havent changed. I use fuse, but I have restic looking at the direct mounting, not the fuse pool. Also I have restic backing up 5 drives and all of them get rescanned.

cdhowie · December 17, 2020, 7:52pm

If you know of a particular snapshot where this happened, can you run restic cat snapshot $SNAPSHOTID and post the output here?

Can you think of anything else that correlates to restic rescanning everything? For example, does this always happen on the first backup following a reboot?

lrrrac · December 17, 2020, 8:01pm

{
  "time": "2020-12-16T04:00:01.484940082-05:00",
  "parent": "3f999436d9465215f0199e6663f5bba9f2a505c6e1401c1923da68273d031c1e",
  "tree": "ad2f1c9fc3b907e972c7920920c943c8f48fa845ca50325521f66f00b41f0a5e",
  "paths": [
    "/media"
  ],
  "hostname": "homeserver",
  "username": "root",
  "excludes": [
    "*/Documents/Drives/snapraid*",
    "*/lost+found/**",
    "*/Downloads/incomplete/*"
  ],
  "tags": [
    "media"
  ]
}

This is the last reboot before it decided it needed to read all files.

It may be rebooting, but I reboot so infrequently that I cant recall if that was the first time since I rebooted, but I know I have rebooted without it causing this issue. Ive previously had the issue with a single drive, but later found I was having separate issued with the drive and it was changing its sd* letter. That is not happening this time.

cdhowie · December 17, 2020, 8:06pm

Ok, so there is a parent snapshot, which is good. I suspect that the device ID is changing, which could happen when the source filesystem (/media) is unmounted and then mounted again. To verify this, we would need to look at the tree objects for both snapshots. You can see in this snapshot what the tree ID is (ad2f1c...) as well as the ID of the parent snapshot (3f9994...).

Can you post the tree from this snapshot (restic cat blob ad2f1c9fc3b907e972c7920920c943c8f48fa845ca50325521f66f00b41f0a5e) as well as the tree from the parent (restic cat snapshot 3f999436d9465215f0199e6663f5bba9f2a505c6e1401c1923da68273d031c1e, look for the “tree” line, and restic cat blob $TREEID)?

If this contains sensitive information you can private message the output to me to keep it out of the public forums.

lrrrac · December 17, 2020, 8:09pm

Im grabbing the info now, but both of these should be the same as I have not run another backup yet as it takes something like 10 days to fully run when this happens. So the info will be from 2 backups before whatever happened. Will the info still help?

lrrrac · December 17, 2020, 8:12pm

rcommand "cat blob ad2f1c9fc3b907e972c7920920c943c8f48fa845ca50325521f66f00b41f0a5e"
repository 64ff7f7c opened successfully, password is correct
{"nodes":[{"name":"media","type":"dir","mode":2147484141,"mtime":"2020-11-19T18:21:44.302163653-05:00","atime":"2020-11-19T18:21:44.302163653-05:00","ctime":"2020-11-19T18:21:44.302163653-05:00","uid":0,"gid":0,"user":"root","group":"root","inode":53608449,"device_id":66306,"content":null,"subtree":"e872f33ea8a91bd69123d474bc720f000e79afbad57f0e8172a919b0d2fee40a"}]}

rcommand "cat blob 6e327c9d13082ee4be62018cd4f76c3635882818ec26e455c5fc4c05b564eaf8"
repository 64ff7f7c opened successfully, password is correct
{"nodes":[{"name":"media","type":"dir","mode":2147484141,"mtime":"2020-11-19T18:21:44.302163653-05:00","atime":"2020-11-19T18:21:44.302163653-05:00","ctime":"2020-11-19T18:21:44.302163653-05:00","uid":0,"gid":0,"user":"root","group":"root","inode":53608449,"device_id":66306,"content":null,"subtree":"9ea5a9a34982329cb5d931e9f1badce1d93fbbc4bc45e1118704f5328629c77b"}]}

I dont see what would be private about this. So here you go.

cdhowie · December 17, 2020, 8:14pm

What we’re looking for is a tree comparison between two snapshots, where the more recent snapshot had to rescan everything from the prior snapshot. So if the most recent snapshot was not created by rescanning everything, then the comparison won’t help. The current backup would need to complete first, and then we could do the comparison.

Based on the output you just sent, the inode and device numbers match so the second snapshot in that example should not have needed a full rescan.

Can you paste the current output of this command?

stat -c '%d %i' /media

lrrrac · December 17, 2020, 8:15pm

Here is the output. 66306 53608449

lrrrac · December 17, 2020, 8:17pm

Also i have my drives mounted to /media/disk_name/

cdhowie · December 17, 2020, 8:17pm

Hmm, that matches what is in the trees you pasted so that’s not it, unless you have other filesystems mounted below /media which did change IDs. Is there anything else mounted under /media?

Can you verify with restic snapshots -c that the hostname did not change? I’m assuming you already verified this, but I’m running out of ideas…

cdhowie · December 17, 2020, 8:18pm

Aha, okay, we are getting warmer. I’m guessing these disk device IDs are changing. So we need to dig a bit deeper into the tree structure.

Can you paste the output of these commands?

restic cat blob 9ea5a9a34982329cb5d931e9f1badce1d93fbbc4bc45e1118704f5328629c77b

stat -c '%d %i %n' /media/*

lrrrac · December 17, 2020, 8:21pm

stat -c '%d %i' /media/*
2097 2
2065 2
2081 2
66306 53609035
2129 2
2113 2
2049 2

rcommand "cat blob 9ea5a9a34982329cb5d931e9f1badce1d93fbbc4bc45e1118704f5328629c77b"
repository 64ff7f7c opened successfully, password is correct
{"nodes":[{"name":"Ilulu","type":"dir","mode":2147484141,"mtime":"2020-11-10T11:43:35.001418114-05:00","atime":"2020-11-10T11:43:35.001418114-05:00","ctime":"2020-11-10T11:43:35.001418114-05:00","uid":0,"gid":0,"user":"root","group":"root","inode":2,"device_id":2097,"content":null,"subtree":"25a8e92aeeb4599ad5ff8267764babba7f4748bc91aedad36ced34c81af988a3"},{"name":"Kanna","type":"dir","mode":2147484141,"mtime":"2020-11-10T11:43:35.005418089-05:00","atime":"2020-11-10T11:43:35.005418089-05:00","ctime":"2020-11-10T11:43:35.005418089-05:00","uid":0,"gid":0,"user":"root","group":"root","inode":2,"device_id":2065,"content":null,"subtree":"b671aff5166de7c8c15d4c4443897d2cbda578f026be8c2f5db72bc255101b61"},{"name":"Lucoa","type":"dir","mode":2147484141,"mtime":"2020-11-10T11:43:35.073417664-05:00","atime":"2020-11-10T11:43:35.073417664-05:00","ctime":"2020-11-10T11:43:35.073417664-05:00","uid":0,"gid":0,"user":"root","group":"root","inode":2,"device_id":2081,"content":null,"subtree":"9fa8f65623917e87a8adadad602c2a31297456d114a60e5f6e51846bfa8e5b42"},{"name":"Prushka","type":"dir","mode":2147484141,"mtime":"2020-11-19T18:21:44.302163653-05:00","atime":"2020-11-19T18:21:44.302163653-05:00","ctime":"2020-11-19T18:21:44.302163653-05:00","uid":0,"gid":0,"user":"root","group":"root","inode":53609035,"device_id":66306,"content":null,"subtree":"ac08ce34ba4f8123618661bef2425f7028ffb9ac740578a3ee88684d2523fee8"},{"name":"Ryuko","type":"dir","mode":2147484141,"mtime":"2020-11-13T12:29:23.524644377-05:00","atime":"2020-11-13T12:29:23.524644377-05:00","ctime":"2020-11-13T12:29:23.524644377-05:00","uid":0,"gid":0,"user":"root","group":"root","inode":2,"device_id":2129,"content":null,"subtree":"bc90fd9b39b57cbaf0caa5796810ed5c536855579a2bcc0b85ed6c32d4bee4c3"},{"name":"Senko","type":"dir","mode":2147484141,"mtime":"2020-12-08T11:24:58.677099641-05:00","atime":"2020-12-08T11:24:58.677099641-05:00","ctime":"2020-12-08T11:24:58.677099641-05:00","uid":0,"gid":0,"user":"root","group":"root","inode":2,"device_id":2113,"content":null,"subtree":"2eac87c50c8f580166f49525fa6e409ec72b05199de34e9d54e98d846a8b5383"},{"name":"Tohru","type":"dir","mode":2147484141,"mtime":"2020-11-16T01:58:38.430307106-05:00","atime":"2020-11-16T01:58:38.430307106-05:00","ctime":"2020-11-16T01:58:38.430307106-05:00","uid":0,"gid":0,"user":"root","group":"root","inode":2,"device_id":2049,"content":null,"subtree":"a0a2a1a4631940ac9f7a954a461f72aff17e3e494adcf0cf3de6145999aa10fb"}]}

Thanks for walking me through this. They all are the same.

cdhowie · December 17, 2020, 8:22pm

You forgot the %n format code, I need this to match up the numbers to directory names.

lrrrac · December 17, 2020, 8:24pm

I checked again, theyre the same.

cdhowie · December 17, 2020, 8:27pm

Hmm, the next thing to check would be the reported mtime from the tree in restic to the disk. This might be easier to check with restic ls -l $SNAPSHOTID as you get all of the output in a nice table. Look for discrepancies between what’s in the last snapshot and what ls -l reports for files under /media.

lrrrac · December 17, 2020, 8:38pm

I checked a file that i saw it was reading again, and the size, permissions, ownership, and mod time all are the same.

cdhowie · December 17, 2020, 8:48pm

If you run your backup command again with -v do you see a line starting with using parent snapshot before it begins scanning the files?

lrrrac · December 17, 2020, 8:55pm

./scripts/rmedia
open repository
repository 64ff7f7c opened successfully, password is correct
lock repository
load index files
using parent snapshot 3ef88e37
start scan on [/media]
start backup on [/media]
[1:22] 866 files 11.338 GiB, total 1140 files 977.140 GiB, 0 errors
/media/Ilulu/Downloads/completed/FILE
/media/Ilulu/Downloads/completed/FILE

Here is the output. 3ef88e37 is the correct last backup.