OpenStack backend error: `conn.ObjectOpen: Object Not Found`

restic 0.9.4 compiled with go1.11.4 on linux/amd64

I’ve been backing up my data for quite a while, however lately, I’m encountering this error more and more, and affecting multiple repos.

I understand my repo became corrupt somehow, however I’m not sure what do to about it. I could not find a way to either rebuild the index, nor prune the repo to get back to a functioning repo.

Could you please tell me what is causing this issue, and how can get back to a working repo?


prod@server:~$ envdir ~/.envdir/restics/my-backup-config/ restic check
  using temporary cache in /tmp/restic-check-cache-764915479
  repository c643fb48 opened successfully, password is correct
  created new cache in /tmp/restic-check-cache-764915479
  create exclusive lock for repository
  Load(<lock/476d09cbfd>, 0, 0) returned error, retrying after 537.007367ms: conn.ObjectOpen: Object Not Found
  Load(<lock/476d09cbfd>, 0, 0) returned error, retrying after 513.44093ms: conn.ObjectOpen: Object Not Found
  Load(<lock/476d09cbfd>, 0, 0) returned error, retrying after 1.264857441s: conn.ObjectOpen: Object Not Found
  Load(<lock/476d09cbfd>, 0, 0) returned error, retrying after 1.157189689s: conn.ObjectOpen: Object Not Found
    signal interrupt received, cleaning up

prod@server:~$ envdir ~/.envdir/restics/my-backup-config/ restic prune
  repository c643fb48 opened successfully, password is correct
  Load(<lock/476d09cbfd>, 0, 0) returned error, retrying after 360.086136ms: conn.ObjectOpen: Object Not Found
  Load(<lock/476d09cbfd>, 0, 0) returned error, retrying after 1.037172466s: conn.ObjectOpen: Object Not Found
  Load(<lock/476d09cbfd>, 0, 0) returned error, retrying after 1.055649236s: conn.ObjectOpen: Object Not Found
    signal interrupt received, cleaning up

prod@server:~$ envdir ~/.envdir/restics/my-backup-config/ restic snapshots
  repository c643fb48 opened successfully, password is correct
  Load(<lock/476d09cbfd>, 0, 0) returned error, retrying after 686.1203ms: conn.ObjectOpen: Object Not Found
  Load(<lock/476d09cbfd>, 0, 0) returned error, retrying after 667.591795ms: conn.ObjectOpen: Object Not Found
  Load(<lock/476d09cbfd>, 0, 0) returned error, retrying after 1.12242475s: conn.ObjectOpen: Object Not Found

prod@server:~$ envdir ~/.envdir/restics/my-backup-config/ restic rebuild-index
  repository c643fb48 opened successfully, password is correct
  Load(<lock/476d09cbfd>, 0, 0) returned error, retrying after 539.305252ms: conn.ObjectOpen: Object Not Found
  Load(<lock/476d09cbfd>, 0, 0) returned error, retrying after 457.227083ms: conn.ObjectOpen: Object Not Found
  Load(<lock/476d09cbfd>, 0, 0) returned error, retrying after 1.443959752s: conn.ObjectOpen: Object Not Found
  Load(<lock/476d09cbfd>, 0, 0) returned error, retrying after 900.166443ms: conn.ObjectOpen: Object Not Found
  Load(<lock/476d09cbfd>, 0, 0) returned error, retrying after 2.323418307s: conn.ObjectOpen: Object Not Found

The errors look like something strange is happening in the OpenStack backend (Swift I guess?). Apparently the storage server reports that a file whose name starts with lock/476d09cbfd exists, but fails to return the file when asked for it.

Please make sure that no restic run is currently active that might be the source of that lock file. Then you can try to delete all files from the repository for which the name starts with lock/476d09cbfd (the full filename contains 64 hex characters).

Thanks @MichaelEischer. I’ve filed a ticket with my backend provider as there seems to be something odd going on indeed.

I don’t know swift well enough to know if this could be a backend bug, or a restic bug though.

# 2 lock files are listed by the backend
prod@server:~$ envdir ~/.envdir/restics/app+privates/ swift list --lh restic+server-files+app+privates --prefix lock
    182 2020-03-03 23:34:28      binary/octet-stream locks/476d09cbfdf722a8c4e88b2f36062f65a6b8a1f973bf411886c91cf8aed42097
    174 2020-03-30 08:55:41      binary/octet-stream locks/d9eb1ad654157e9cc5727574daa4de5cf3cf85408c5fff19bd2843b1b19c7eb7

# 1st lock file is "not found"
prod@server:~$ envdir ~/.envdir/restics/app+privates/ swift download --output /tmp/debug/file2 restic+server-files+app+privates locks/476d09cbfdf722a8c4e88b2f36062f65a6b8a1f973bf411886c91cf8aed42097
    Object 'restic+server-files+app+privates/locks/476d09cbfdf722a8c4e88b2f36062f65a6b8a1f973bf411886c91cf8aed42097' not found

# 1st lock file is read ok
prod@server:~$ envdir ~/.envdir/restics/app+privates/ swift download --output /tmp/debug/file3 restic+server-files+app+privates locks/d9eb1ad654157e9cc5727574daa4de5cf3cf85408c5fff19bd2843b1b19c7eb7
    locks/d9eb1ad654157e9cc5727574daa4de5cf3cf85408c5fff19bd2843b1b19c7eb7 [auth 1.115s, headers 1.362s, total 1.363s, 0.001 MB/s]

As even the normal swift tool cannot download that lock file, that rather looks like a bug in the storage backend. restic relies on the backend to provide some sort of reasonable semantics. Claiming that a file exists but failing to retrieve it does not fall into that category. So either the lock was actually deleted and swift just ‘forgot’ to remove the directory entry, or swift lost the lock file data. If I had to choose between both options I’d prefer the first explanation (as it doesn’t entail outright data loss), but neither option should happen. (Just took a look at the swift documentation, objects and container listings are stored on separate servers, but there’s also an auditor process that should detect discrepancies between them)

In order to revive the repository it should be enough to have that lock file entry removed from the swift container. Afterwards restic should be able to continue from there.

Thanks @MichaelEischer. I’m still waiting for my provider’s response so I’ll report back once I’ve got more info.

For info, deleting the lockfile via the swift tool allowed me to get my backups going again, thanks @MichaelEischer.

My provider still hasn’t given me a clear answer as to what’s going on. I encountered this error elsewhere (other backups) and swift delete sometimes completes successfully, sometimes returns a 404 Not Found and then the file is removed from the listing (the previous restic list command does show a file supposedly existing)

So possibly a buggy backend, I’m changing provider.