Moosefs backup - fuse filesystem

#1

Hi,

I am trying to backup files from network filesystem called moosefs (www.moosefs.com)
moose offers fuse mount.
Every time I run the backup, every file is validated again. Even with --ignore-inode.

I assume that the problem is that some of the file attributes must be different between the runs but how do I check which ones?

#2

Welcome to the forum!

First: Which version of restic are you using? Paste the output of restic version please :slight_smile:

You can do the following:

  • Mount the file system, make a backup of a single file with restic, run stat on the file and save the output
  • Umount and mount the file system again at exactly the same path, make another backup of the same file, verify that the effect occurs (restic reads the file twice)
  • Run stat on the file again

Post the output of restic and the output of both stat commands. Maybe we can see what’s going on. I would have expected that --ignore-inode makes restic not re-read the file…

#3

Thank you @fd0
restic 0.9.5 compiled with go1.12.4 on linux/amd64
executed in docker on Ubuntu 16.04

on every backup, every file is identified as new
even without re-mounting fuse
quite properly nothing to transfer is identified

stat for first docker run

  Size: 25665054        Blocks: 50128      IO Block: 65536  regular file
Device: 2bh/43d Inode: 2529689     Links: 1
Access: (0777/-rwxrwxrwx)  Uid: ( 1000/ UNKNOWN)   Gid: ( 1000/ UNKNOWN)
Access: 2019-05-06 10:44:12.000000000
Modify: 2010-08-14 22:54:57.000000000
Change: 2019-03-18 13:34:07.000000000

stat for next docker run (no re-mount)

  Size: 25665054        Blocks: 50128      IO Block: 65536  regular file
Device: 2bh/43d Inode: 2529689     Links: 1
Access: (0777/-rwxrwxrwx)  Uid: ( 1000/ UNKNOWN)   Gid: ( 1000/ UNKNOWN)
Access: 2019-05-06 10:44:12.000000000
Modify: 2010-08-14 22:54:57.000000000
Change: 2019-03-18 13:34:07.000000000
#4

just wonder if it is not the cache persistency between docker runs

I hold .cache/restic
did I miss something?

#5

The output of stat looks good, restic can find out that it’s the same unmodified file, even without --ignore-inode.

Can you paste the output for restic backup for both runs? What’s the path used for calling restic? For restic to find the previous snapshot, it must be exactly the same. What does restic snapshots print? Does the host name within the docker container stay the same? You can manually override the hostname by running restic backup --host foobar.

Does not matter, restic just rebuilds the cache if needed.

#6

ok, I have run backup twice in the same container
the second time the file was properly recognised as “unchanged”

so that’s not a problem with restic itself
likely with the way I run docker

#7

but it does not look like it is a problem with the --cache-dir
even if I define this flag and make the directory persistent, every docker run still sees a new file and no changes

that’s my command:
could it be something with rest api of rclone?

docker run --rm -it -v /mnt/mfs/photos:/mnt/photos -v /home/kisiel/restic_cache:/root/.cache/restic restic/restic backup --ignore-inode --repo rest:http://<>user:<pass>@<server>:<port> "/mnt/photos/_Zdjecia/1_Albums_waiting/2010/2010-08-14 - Crolinnhe - Spa/CR2/20100814_235618_0000_0418.CR2" -vvvv --cache-dir /root/.cache/restic
#8

first docker run

open repository
enter password for repository:
repository b3ccf04a opened successfully, password is correct
lock repository
load index files
start scan on [/mnt/photos/_Zdjecia/1_Albums_waiting/2010/2010-08-14 - Crolinnhe - Spa/CR2/20100814_235618_0000_0418.CR2]
start backup on [/mnt/photos/_Zdjecia/1_Albums_waiting/2010/2010-08-14 - Crolinnhe - Spa/CR2/20100814_235618_0000_0418.CR2]
scan finished in 45.321s: 1 files, 24.476 MiB
new       /mnt/photos/_Zdjecia/1_Albums_waiting/2010/2010-08-14 - Crolinnhe - Spa/CR2/20100814_235618_0000_0418.CR2, saved in 2.572s (0 B added)
new       /mnt/photos/_Zdjecia/1_Albums_waiting/2010/2010-08-14 - Crolinnhe - Spa/CR2/, saved in 2.577s (0 B added, 0 B metadata)
new       /mnt/photos/_Zdjecia/1_Albums_waiting/2010/2010-08-14 - Crolinnhe - Spa/, saved in 2.578s (0 B added, 0 B metadata)
new       /mnt/photos/_Zdjecia/1_Albums_waiting/2010/, saved in 2.582s (0 B added, 0 B metadata)
new       /mnt/photos/_Zdjecia/1_Albums_waiting/, saved in 2.582s (0 B added, 0 B metadata)
new       /mnt/photos/_Zdjecia/, saved in 2.583s (0 B added, 0 B metadata)
new       /mnt/photos/, saved in 2.584s (0 B added, 0 B metadata)
new       /mnt/, saved in 2.584s (0 B added, 0 B metadata)

Files:           1 new,     0 changed,     0 unmodified
Dirs:            7 new,     0 changed,     0 unmodified
Data Blobs:      0 new
Tree Blobs:      1 new
Added to the repo: 346 B

processed 1 files, 24.476 MiB in 0:54
snapshot 6f261a8e saved

next docker run

open repository
enter password for repository:
repository b3ccf04a opened successfully, password is correct
lock repository
load index files
start scan on [/mnt/photos/_Zdjecia/1_Albums_waiting/2010/2010-08-14 - Crolinnhe - Spa/CR2/20100814_235618_0000_0418.CR2]
start backup on [/mnt/photos/_Zdjecia/1_Albums_waiting/2010/2010-08-14 - Crolinnhe - Spa/CR2/20100814_235618_0000_0418.CR2]
scan finished in 37.511s: 1 files, 24.476 MiB
new       /mnt/photos/_Zdjecia/1_Albums_waiting/2010/2010-08-14 - Crolinnhe - Spa/CR2/20100814_235618_0000_0418.CR2, saved in 2.790s (0 B added)
new       /mnt/photos/_Zdjecia/1_Albums_waiting/2010/2010-08-14 - Crolinnhe - Spa/CR2/, saved in 2.794s (0 B added, 0 B metadata)
new       /mnt/photos/_Zdjecia/1_Albums_waiting/2010/2010-08-14 - Crolinnhe - Spa/, saved in 2.795s (0 B added, 0 B metadata)
new       /mnt/photos/_Zdjecia/1_Albums_waiting/2010/, saved in 2.796s (0 B added, 0 B metadata)
new       /mnt/photos/_Zdjecia/1_Albums_waiting/, saved in 2.796s (0 B added, 0 B metadata)
new       /mnt/photos/_Zdjecia/, saved in 2.796s (0 B added, 0 B metadata)
new       /mnt/photos/, saved in 2.797s (0 B added, 0 B metadata)
new       /mnt/, saved in 2.797s (0 B added, 0 B metadata)

Files:           1 new,     0 changed,     0 unmodified
Dirs:            7 new,     0 changed,     0 unmodified
Data Blobs:      0 new
Tree Blobs:      1 new
Added to the repo: 346 B

processed 1 files, 24.476 MiB in 0:45
snapshot 4b37bd5f saved
#9

@fd0
it feels like between docker runs restic loses the knowledge that the file with exactly the same properties already exists in the repository

I think I was hoping that file properties are part of repo so every time I see the file without changes I can realise I already have it.

even if during runtime this information is stored somewhere and re-used on next run
I am not convinced that it is the correct behaviour

#10

Can you please answer the remaining questions I had? What about the output of restic snapshots? Does the hostname within the docker container change in between runs?

#11

@fd0

thank you
you are genius!
docker generates unique hostname on every run
–host solved all my problems

1 Like
#12

@fd0

not sure if it belongs here, but I guess it’s sort of related
I am trying to backup 12TB of data which already sits in repository loaded from the different host
it does not try to load anything, but the estimated runtime is exceeding 400hours
presumably due to the time required to check every file for differences

it looks like if I stop the backup in the middle and then restart, it starts from the beginning.
I mean it does not recognise previously checked files and checks them again.
even if I try to divide and conquer
say I have a folder structure
parent

  • child1
  • child2

I can backup child1 and then child2
but if I backup parent it again checks all the files

is there a way around it?

#13

Just to clarify, the files are checked again because there is no parent snapshot that references them. The already-uploaded data should not be re-added to the repository as it would be deduplicated, but the local files must still be hashed in order to determine if they can be deduplicated.

1 Like
#14

restic assumes that it’s cheap to re-read source data, which is not necessarily the case for remote file systems mounted locally. It’s not optimized for this use case. Until you have completed a backup for exactly the set of directories that you’re going to save, restic will re-read all data locally.

#15

@fd0
understood
but 400hours that’s a lot of time and I don’t think that restic writes partial snapshots
so if I interrupt the process I have to restart the whole thing.
Is there a way around it?

#16

As a hack, you could make use of --exclude (which is not considered when looking for a parent snapshot) and use it to exclude enough data to get the backup time to be reasonable, then widen the set of backed up data each time so that the new data being added at each step keeps the backup duration under a certain amount. For example, if you have four folders of equal size, you could add them one at a time like:

# Add /folder/1
restic backup --exclude=/folder/2 --exclude=/folder/3 --exclude=/folder/4 /folder

# Add /folder/2
restic backup --exclude=/folder/3 --exclude=/folder/4 /folder

# Add /folder/3
restic backup --exclude=/folder/4 /folder

# Add /folder/4
restic backup /folder
#17

@cdhowie
thank you. That’s definitely not ideal but I might consider it if there is no other option

#18

The good news is that you only have to do this process once. After all of the data is in a single snapshot, the fast path should ignore files whose mtime/size hasn’t changed since the last snapshot.

"Fast path" adding trees after interrupting backup