When/why do files become “new”?

Restic is great, and working well for me.

I’m seeking to understand again.

Yesterday at 4:24pm:

using parent snapshot 0bbd85ad

Files:         163 new,   186 changed, 1114934 unmodified
Dirs:           25 new,   195 changed, 154515 unmodified
Added to the repository: 216.023 MiB (56.247 MiB stored)

processed 1115283 files, 122.762 GiB in 1:02
snapshot 241362a1 saved

The next backup, two hours later:

using parent snapshot 4c40a105

Files:       691722 new,     2 changed, 424100 unmodified
Dirs:        109479 new,    17 changed, 45320 unmodified
Added to the repository: 484.498 MiB (141.625 MiB stored)

processed 1115824 files, 123.518 GiB in 4:31
snapshot 13109085 saved

The number of new files and folders has remained high since (currently 797055, and 133552 respectively). The amount of data added to the repository is reasonable, at about 66MiB.
I’m wondering how these files and folders became considered new, and why they remain so.
Again, everything works fine, just seeking to understand.

Thanks!

Did you run restic diff already?

No, I haven’t. Those are the outputs that are being emailed to me when restic runs.

It looks a bit odd to me as well. Restic is able to find a parent (“previous”) snapshot of the same data, so it should recognize that the files are already there. However, restic is very strict and will (if in doubt) opt to consider a file as new and re-read the content (the deduplication takes care of the restic, it’ll just take longer and report more files as “new”).

What’s the source of the data? Do you maybe backup a directory mounted from a remote source?

One of the things restic checks to see if the file is the same as the previous one is the inode number. For most remote file systems (especially things such as sshfs), the inode is not static and will be different the next time the file system is mounted. You can try with the --ignore-inode and --ignore-ctime options (for the backup command), which tell restic to be much less strict.

Please report back if you try it!

4 Likes

There are also a few more details in the documentation on how the change detection works.

I’ve done some more sleuthing.

Inodes change every backup

I picked a file that hasn’t changed in years from the TeX package:

  File: usr/share/man/man1/texmfstart.1.gz
  Size: 1313      	Blocks: 3          IO Block: 512    regular file
Device: 48h/72d	Inode: 10513798589495246425  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2020-03-30 17:48:24.000000000 -0400
Modify: 2020-03-30 17:48:24.000000000 -0400
Change: 2023-02-23 08:57:48.276616255 -0500
 Birth: -

It's Inode changes between/during every backup. Apparently this isn't the issue.

Inode of file for 2023-03-14:
Device: 48h/72d	Inode: 7332003661669939424  Links: 1
Device: 48h/72d	Inode: 10733747488517407657  Links: 1
Device: 48h/72d	Inode: 4116805244364871059  Links: 1
Device: 48h/72d	Inode: 8444514163394085700  Links: 1
Device: 48h/72d	Inode: 13142612888247565005  Links: 1
Device: 48h/72d	Inode: 17731653774848215882  Links: 1
Device: 48h/72d	Inode: 15723406623955317933  Links: 1
Device: 48h/72d	Inode: 53665616766580011  Links: 1
Device: 48h/72d	Inode: 13013334016280768227  Links: 1
Device: 48h/72d	Inode: 10513798589495246425  Links: 1
Device: 48h/72d	Inode: 16649499435083292052  Links: 1
Device: 48h/72d	Inode: 17315614882769877611  Links: 1
Device: 48h/72d	Inode: 10623724754153206201  Links: 1
Device: 48h/72d	Inode: 1184883235696100237  Links: 1
Device: 48h/72d	Inode: 2992461868047737472  Links: 1

Edit: these are the inodes of the backup files, so of course they are different.

The backup parent snapshot continuity is broken

I think this is the real issue.
Here is text from several backups on 2023-03-14, the times are from the emails (this window scrolls):

12:24
using parent snapshot aae895c8

Files:         176 new,   181 changed, 1114873 unmodified
Dirs:           45 new,   289 changed, 154405 unmodified
Added to the repository: 925.894 MiB (583.757 MiB stored)

processed 1115230 files, 122.741 GiB in 1:40
snapshot 7b99c3e6 saved

14:23
using parent snapshot 7b99c3e6

Files:         144 new,   209 changed, 1114942 unmodified
Dirs:           29 new,   204 changed, 154505 unmodified
Added to the repository: 215.537 MiB (50.491 MiB stored)

processed 1115295 files, 122.750 GiB in 0:49
snapshot 0bbd85ad saved

16:24
using parent snapshot 0bbd85ad

Files:         163 new,   186 changed, 1114934 unmodified
Dirs:           25 new,   195 changed, 154515 unmodified
Added to the repository: 216.023 MiB (56.247 MiB stored)

processed 1115283 files, 122.762 GiB in 1:02
snapshot 241362a1 saved

<<< This is where the parent chain becomes broken, and each parent is different after >>>

18:27
using parent snapshot 4c40a105

Files:       691722 new,     2 changed, 424100 unmodified
Dirs:        109479 new,    17 changed, 45320 unmodified
Added to the repository: 484.498 MiB (141.625 MiB stored)

processed 1115824 files, 123.518 GiB in 4:31
snapshot 13109085 saved

20:27
using parent snapshot 2a654fa5

Files:       691757 new,     2 changed, 416317 unmodified
Dirs:        109489 new,    17 changed, 45063 unmodified
Added to the repository: 289.374 MiB (77.758 MiB stored)

processed 1108076 files, 123.467 GiB in 4:22
snapshot 4ae1ec7e saved

22:27
using parent snapshot 4283cf52

Files:       691759 new,     0 changed, 416319 unmodified
Dirs:        109489 new,    15 changed, 45065 unmodified
Added to the repository: 72.778 MiB (8.572 MiB stored)

processed 1108078 files, 123.467 GiB in 4:22
snapshot 7be99e51 saved

E.g. when things are working correctly:
aae895c8 → 7b99c3e6 → 7b99c3e6 → 0bbd85ad → 0bbd85ad → 241362a1
and after things go sideways:
4c40a105 → 13109085 XX 2a654fa5 → 4ae1ec7e XX 4283cf52 → 7be99e51

Other info:

OS: Debian GNU/Linux 11 (bullseye) x86_64 
Host: OptiPlex 7040 
Kernel: 5.10.0-21-amd64 
Uptime: 5 days, 20 hours, 34 mins 
Packages: 2870 (dpkg) 
Shell: zsh 5.8 
Resolution: 3840x2160 
DE: GNOME 3.38.6 
WM: Mutter 
WM Theme: Adwaita 
Theme: Adwaita [GTK2/3] 
Icons: Adwaita [GTK2/3] 
Terminal: gnome-terminal 
CPU: Intel i7-6700 (8) @ 4.000GHz 
GPU: Intel HD Graphics 530 
GPU: NVIDIA GeForce GT 1030 
Memory: 8747MiB / 48074MiB 
> lsblk -o NAME,PATH,FSTYPE,MOUNTPOINT
NAME        PATH           FSTYPE MOUNTPOINT
sda         /dev/sda              
└─sda1      /dev/sda1      ext4   /media/john/Backup
sdb         /dev/sdb              
└─sdb1      /dev/sdb1      btrfs  /run/timeshift/backup
sr0         /dev/sr0              
nvme1n1     /dev/nvme1n1          
└─nvme1n1p1 /dev/nvme1n1p1 btrfs  /home
nvme0n1     /dev/nvme0n1          
├─nvme0n1p1 /dev/nvme0n1p1 vfat   /boot/efi
├─nvme0n1p2 /dev/nvme0n1p2 btrfs  /
├─nvme0n1p3 /dev/nvme0n1p3 swap   [SWAP]
├─nvme0n1p4 /dev/nvme0n1p4 btrfs  /var
├─nvme0n1p5 /dev/nvme0n1p5 btrfs  /tmp
├─nvme0n1p6 /dev/nvme0n1p6 btrfs  /usr/local
└─nvme0n1p7 /dev/nvme0n1p7 btrfs  /opt

I also run Timeshift and Backintime.

Edit: here are some diff results that echo what the email text says.

root in ~ took 31s 
❯ alias resticstorage
resticstorage='restic --repository-file /etc/restic.storage --password-file /etc/restic.passwd --cache-dir /var/cache/restic'

root in ~ took 27s 
❯ resticstorage diff aae895c8 7b99c3e6 | wc -l
615

root in ~ took 27s 
❯ resticstorage diff 241362a1 4c40a105 | wc -l
820871

root in ~ took 30s 
❯ resticstorage diff 4c40a105 13109085 | wc -l
820473

❯ resticstorage diff 13109085 2a654fa5 
<million files>
Files:         919 new, 699505 removed,    15 changed
Dirs:          919 new, 109737 removed
Others:          1 new, 17436 removed
Data Blobs:   7032 new, 413347 removed
Tree Blobs:    802 new, 98756 removed
  Added:   9.177 GiB
  Removed: 53.093 GiB

root in ~ took 30s 
❯ resticstorage diff 13109085 2a654fa5 | wc -l
828542

texmfstart* is not one of the million files listed as changed.

restic picks the latest snapshot with the exact same set of paths and the same host name. I guess that restic snapshots --host hostname --path path1 [--path path2] [...] might reveal what’s happening. Are there maybe two backup tasks for the same paths but with different excludes?

Thanks for the reply!

I did restic snapshots $(hostname) (adding --path path1 didn’t list any snapshots for a few different paths I tried) and got a list of backups. They are from two different methods, one is systemd, and the other is a cron job. When the systemd backup locked up, the reported changed files from the cron job went back to what one would expect, a few dozen, a few hundred.

I thought this might have to do with each backup updating the atime for the files, but according to the docs, restic does not save the atime unless asked to save it. Perhaps the atime is still being checked by restic and detected as “changed” or “new”?

The two backup methods do have different excludes. But since (as you say) “restic picks the latest snapshot with the exact same set of paths and the same host name.”, it would seem that each method would only use it’s own snapshot files. From the discontiguous snapshot IDs listed in my previous post, this doesn’t seem to be the case. That is, regardless of the other backup method, each should show A->B, B->C, C->D snapshots, even if the other method is also backing up to X-> between the A->B and B->C backups.

It also seems from this, that restic is still somehow considering atime when determining if a file is new or changed.

Maybe the simplest thing is to eliminate one backup method and let it go. Since the systemd method hung (and will not die with kill) though, it makes me leery of doing that.

No. The change detection is exclusively based on the information included in the snapshot.

paths != excludes. “Paths” refer to the list of file names / directory that is passed to the backup command. And which is shown later on by snapshots. If two backups use the same paths, then they should also use matching excludes to avoid the problems above.