Error "lstat - errno 524" during backup

Hello,

I’m using this script to backup my files

SOURCE="/mnt/windows"
REPO="--repo /backup --password-file cred.txt"
mount -t cifs -o ro \\path\to\windows-computer $SOURCE
restic backup $REPO $SOURCE --cleanup-cache --ignore-inode --exclude="parent.lock" --verbose --verbose

During backup I get many errors

error: Readdirnames /mnt/windows/[...] readdirent: no such file or directory" 

After searching on Github, I found issue #2659 and added this line to my script

export GODEBUG=asyncpreemptoff=1

Now the “readdirent” errors disappear. Remaining errors on some files and folders:

error: lstat /mnt/windows/file.png: errno 524
error: lstat /mnt/windows/Profile: errno 524

If I access the folder manually, I can copy/access files and folders.

stat Profile/
  File: Profile/
  Size: 0               Blocks: 0          IO Block: 1048576 directory
Device: 2dh/45d Inode: 281474976769165  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2017-06-06 11:49:41.539084800 +0200
Modify: 2017-06-06 11:49:41.539084800 +0200
Change: 2019-10-19 20:08:47.072692300 +0200
 Birth: -

stat Profile/Firefox/
  File: Profile/Firefox/
  Size: 0               Blocks: 0          IO Block: 1048576 directory
Device: 2dh/45d Inode: 281474976769166  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2016-01-21 19:57:03.254402300 +0100
Modify: 2016-01-21 19:57:03.254402300 +0100
Change: 2019-10-19 20:06:55.147290500 +0200
 Birth: -

The problem is that if this error happens (e.g. on folder “Profile”) then Restic cannot find the subfolders/files below (e.g. “Firefox”) which results in a partial backup.

What does this error mean?

I investigated further:
The error is not consistent and does not happen every time. What I also found is that this error mostly happens in top folders, e.g.

/mnt/windows/a/
/mnt/windows/b/file.html
/mnt/windows/c/                       error 524
/mnt/windows/d/image.jpg
/mnt/windows/e/
/mnt/windows/f/

The problem:
The files/subfolders descending into /c/ are not detected and are not part of backup.
Sometimes all folders /a/, /b/, /c/, etc. have errors.
The backup is written to snapshot ABC.

This causes another problem:
The next backup uses ABC as parent which hat 0 files below /c/ folder. If there is no error 524 this time and the /c/ folder works, Restic will detect all the files below as new (because they don’t “exist” in backup ABC, which causes the backup to run for very long time filling the log with

new    /mnt/windows/c/file1.html           0 bytes added
new    /mnt/windows/c/folder/              0 bytes added
new    /mnt/windows/c/folder/file2.html    0 bytes added

Of course Restic deduplicates the files correctly because they exist somewhere in older snapshots.
The snapshot is now DEF. The next time restic runs, the errors 524 don’t matter and snapshot GHI is written.

So every 2 snapshots there is a long backup which reads all (in reality unchanged) files as new and the next one created is “faulty”, which caues next to do full read, etc. etc.

I will try to find out more and maybe downgrade and see if this happens in older versions too.
Maybe this is all related to the known GO bugs

I will also run a full check with --read-data and see if this detects anything
If the repo size is 500 GB with 1000 snapshots and I use this parameter, how much data will Restic read? 500 GB or 1000x500 GB?

@uok Can you test whether restic 0.9.6 (or use a restic version built with Go 1.13) also has the errno 524 problem? That errno could mean “not supported” but it is so uncommon, that go has no string representation for it… For now it’s probably best to also add the problems you’re seeing to issue 2659, if it turns out to be unrelated we can still split it off later on.

--read-data will read every (physical) file in the backup storage once. In your example that would be the 500GB.

@MichaelEischer, I took GODEBUG=asyncpreemptoff=1 from #2659 which reduces some errors.

after lots of testing and googling I found #1800 which has the helpful hint of adding nouser_xattr to the cifs mount.

Both changes in my script are needed for Restic v0.11.0 - now everything runs fine! :partying_face: :relieved:

Could you test whether the current master branch also works without specifying nouser_xattr? #1800 should have been fixed by https://github.com/restic/restic/pull/3034 by now.