Running restic in docker always causes files to be identified as new

I have a web app that is built with docker and I’d like to use restic to backup my database. I want to run restic from inside docker to try and keep the list of software to install on the host as small as possible.

My current strategy is to have a backup script that first generates a database dump from Postgres to a folder on the host, it then runs the command docker run --rm --interactive --tty=false --init -v ./backups:/data -v ./restic-pass:/pass:ro -v restic-cache:/cache --env-file ./.env restic/restic:0.9.6 --cache-dir /cache -p /pass backup /data.

This works but I guess because of the way docker mounts the volumes restic always picks up the files to back up as new, restic does seem to be deduping the files so its not adding a new full copy to the repo each time, but will constantly detecting the files as new cause me an issue going forward? I have also tried with the --force and --ignore-inode switches but these don’t seem to make a difference.

1 Like

As long as restic only sends the data that was changed, it should be fine. If it scans every file it just means you have to endure some more I/O and time, but in the end the stuff that’s sent to the repository shouldn’t be different. I wouldn’t worry.

@markbeazley may I suggest you use stat(1) utility to check what has changed in the files you don’t expect to be detected as changed.

% stat example.txt
File: example.txt
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 10302h/66306d Inode: 7864860 Links: 1
Access: (0644/-rw-r–r–) Uid: ( 1000/zcalusic) Gid: ( 100/ users)
Access: 2020-05-21 15:52:53.801093454 +0200
Modify: 2020-05-21 15:52:53.801093454 +0200
Change: 2020-05-21 15:52:53.801093454 +0200
Birth: -

If some of those attributes change, restic has no other option, but to reread the file and look for data changes inside it (even if there’s none). At least, device/inode/size/modify_time parameters should NOT change between multiple invocations if the file in question has not changed. Probably also change time, dunno.

Of course, check all this from inside container, and between restic invocations. Then we can understand how running restic in docker containers defeats its regular has-file-changed logic, if container volume mount mechanism is the culprit, and not something else.

Having written all this, there’s one more thing that could force restic to detect file as new. Check container hostname. If it changes between runs, probably restic decides, another host, another file?

In any case, you should be able to solve the issue, with some additional debugging… Good luck!

Right so I’ve done an sql dump, so by backups folder looks like this on my host

total 1296
drwxr-xr-x  2 markb mark    4096 May 21 15:16 ./
drwxr-xr-x 25 markb mark    4096 May 20 13:10 ../
-rw-r--r--  1 markb mark 1311394 May 21 15:17 db.dump
-rw-r--r--  1 markb mark      14 May 20 11:01 .gitignore

every run of restic backup gives this output

+ bin-docker/restic backup --verbose /data
open repository
repository 887d2911 opened successfully, password is correct
lock repository
load index files
start scan on [/data]
start backup on [/data]
scan finished in 0.394s: 2 files, 1.251 MiB

Files:           2 new,     0 changed,     0 unmodified
Dirs:            0 new,     0 changed,     0 unmodified
Data Blobs:      0 new
Tree Blobs:      0 new
Added to the repo: 0 B  

processed 2 files, 1.251 MiB in 0:00
snapshot dd27e8de saved

If I run docker run --rm --interactive --tty=false --init -v /path/to/backups:/data --entrypoint "stat" restic/restic:0.9.6 /data/db.dump

I get

  File: /data/db.dump
  Size: 1311394     Blocks: 2568       IO Block: 4096   regular file
Device: 802h/2050d  Inode: 3152712     Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/ UNKNOWN)   Gid: ( 1000/ UNKNOWN)
Access: 2020-05-21 14:17:21.000000000
Modify: 2020-05-21 14:17:17.000000000
Change: 2020-05-21 14:17:17.000000000

Subsequent runs return the exact same values.

@zcalusic Just checked the output of restic snapshot and seems you are right each run has a different hostname,

--------------------------------------------------------------
d1bff105  2020-05-19 11:26:40  d398bba15085              /data
551ca15f  2020-05-19 11:27:23  57bd7c937fc5              /data
e8411a24  2020-05-19 11:29:51  41042786e8f1              /data
64d9b2ea  2020-05-19 11:30:01  6509be0da194              /data
59aed31e  2020-05-19 11:30:16  87ba54b7ff65              /data
07a52ca7  2020-05-19 11:30:25  37f48a73d511              /data
28d6592f  2020-05-19 11:31:57  ad82da2bb42d              /data
01211a46  2020-05-19 11:36:46  f85e98e11462              /data
5a59e670  2020-05-19 11:42:04  70885c547a18              /data
e3d2936a  2020-05-20 10:10:20  14fcbeba1278              /data
e73ab74e  2020-05-20 11:45:12  8c695199cc2c              /data
3ba3f8f8  2020-05-20 13:04:47  1231c99d72e5              /data

Added --host restic to my restic command and it started correctly identifying them as unchanged. Thanks for the help!

2 Likes

Guessing that restic wasn’t able to find a parent snapshot, and thereby wasn’t able to compare metadata with anything, and therefore did a full file scan. When you told it the hostname, it can identify a parent snapshot. An alternative to using --host could have been to use --parent, but configuring a proper hostname is indeed better.

1 Like

Just found this in the docker documentation:

a container’s hostname defaults to be the container’s ID in Docker .

So you could also set a hostname for the docker container, see Networking overview | Docker Docs .