Ciphertext verification failed - different repos, different hosts

Hello,

we are experiencing several ciphertext verification failed errors for different repositories and also different hosts.

We are using restic 0.16.4 as a sidecar container (inside Kubernetes) to backup Strapi volumes (consisting of a SQLite3 database and several static content files like images) to S3 object storage.
We now had 3 different cases were we got that special ciphertext verification failed error. And there seems to be no way to recover from that.
BTW: we also use exactly the same sidecar container for other (much larger) volume backups without any problems. Those Strapi volume backups are quite small - like 7-20 MB. Therefore I assume that we are hitting a special case with the Strapi volume somehow.

All 3 cases happened in different Kubernetes clusters and therefore on different VMs (also on different hardware) and in fact at different hosters too. Also the target S3 object storages were from 3 different locations from 2 distinct providers.
Therefore I really assume that hardware failures are rather unlikely for all those cases.

All 3 cases happened in the last 2 or 3 weeks and only on Strapi volumes even though we are backing up many more volumes to S3 using restic.

In one case we just deleted the whole backup repository (using rclone as restic does not have an option to do that). Both other cases are still in failure state and could be used for debug purposes. I also downloaded the repositories from S3 to recheck the repository locally - without any change.

Here are some outputs. I really hope you can help fixing that problem.

restic version
$ restic version
restic 0.16.4 compiled with go1.21.6 on linux/amd6
restic stats # Shows error
$ restic stats
repository b9d21369 opened (version 2, compression level auto)
[0:00] 0.00%  0 / 4 index files loaded
ciphertext verification failed
github.com/restic/restic/internal/crypto.init
	/restic/internal/crypto/crypto.go:30
runtime.doInit1
	/usr/local/go/src/runtime/proc.go:6740
runtime.doInit
	/usr/local/go/src/runtime/proc.go:6707
runtime.main
	/usr/local/go/src/runtime/proc.go:249
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1650
restic check --read-data # Shows error
$ restic check --read-data
using temporary cache in /tmp/restic-check-cache-3789657315
repository b9d21369 opened (version 2, compression level auto)
created new cache in /tmp/restic-check-cache-3789657315
create exclusive lock for repository
load indexes
[0:00] 100.00%  4 / 4 index files loaded
error: error loading index a1243cf997162370e57ad2589f605271922748b20f8800b8090cfb1752b52285: ciphertext verification failed
error: error loading index e83ccb66f95576b834669f31fb7f8e69059a4e343eacba679c6941fba66df4a3: ciphertext verification failed
error: error loading index a75ea15927343280ab9a4dfdbf6615f6c1489dc89c132f334f8f866159e0ac1b: ciphertext verification failed
error: error loading index 474bc50b2c62b12fc3ac8d7127c9583a6468f6e652bab263c8b78e14896f553f: ciphertext verification failed
Fatal: LoadIndex returned errors
restic list blobs # Shows error
$ restic list blobs
repository b9d21369 opened (version 2, compression level auto)
ciphertext verification failed
github.com/restic/restic/internal/crypto.init
	/restic/internal/crypto/crypto.go:30
runtime.doInit1
	/usr/local/go/src/runtime/proc.go:6740
runtime.doInit
	/usr/local/go/src/runtime/proc.go:6707
runtime.main
	/usr/local/go/src/runtime/proc.go:249
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1650
restic list index
$ restic list index
repository b9d21369 opened (version 2, compression level auto)
e83ccb66f95576b834669f31fb7f8e69059a4e343eacba679c6941fba66df4a3
a1243cf997162370e57ad2589f605271922748b20f8800b8090cfb1752b52285
474bc50b2c62b12fc3ac8d7127c9583a6468f6e652bab263c8b78e14896f553f
a75ea15927343280ab9a4dfdbf6615f6c1489dc89c132f334f8f866159e0ac1b
restic list packs
$ restic list packs
repository b9d21369 opened (version 2, compression level auto)
86b25cef771f189ca565bc19c99ed6c7c9e3691ba3758079179a600c58e06f88
0016e73b5df0434ddb2642ca4b66378b1bc9ba7d903721b023ece405dbb7047a
41607a0f83dd1d390189fe72059f4808dc633ed865b869a1c64a63925161185d
cfd2ce39624aa9a12d3656deecbf5ecc02144fc277b84e0b1e30669527e80d62
ae6b17714a70a19870c998ce0e9a28734f2ea0f471aec6832791dc8ce0ffaa69
ada0c8097a969bf9d126b39d84da8bfbf5dc428a354c31151c3e1cc86e742fce
restic list snapshots
$ restic list snapshots
repository b9d21369 opened (version 2, compression level auto)
61778ecfa0b209c6eb95a9c07a5058cf80bddb28e427a42e7f6495e377b06d61
74c189425790faf2497ddface2a041ba7fbb883223fa8c93235b8d1b3d7f5d2c
104598c8811d078906e5f1d13044e83d826df034cfb3a1c86a31ac7b317d7876
7d9cf8d381ff6651d5761f9fd8d15566d142e70d827380500b47708eaa4840c3
restic snapshots # Shows error
$ restic snapshots
repository b9d21369 opened (version 2, compression level auto)
Ignoring "74c189425790faf2497ddface2a041ba7fbb883223fa8c93235b8d1b3d7f5d2c": failed to load snapshot 74c18942: ciphertext verification failed
Ignoring "104598c8811d078906e5f1d13044e83d826df034cfb3a1c86a31ac7b317d7876": failed to load snapshot 104598c8: ciphertext verification failed
Ignoring "61778ecfa0b209c6eb95a9c07a5058cf80bddb28e427a42e7f6495e377b06d61": failed to load snapshot 61778ecf: ciphertext verification failed
Ignoring "7d9cf8d381ff6651d5761f9fd8d15566d142e70d827380500b47708eaa4840c3": failed to load snapshot 7d9cf8d3: ciphertext verification failed
ls -R of repository
# File PASSWORD contains the RESTIC_PASSWORD - just for debugging purposes
$ ls -R
.:
config  data  index  keys  locks  PASSWORD  snapshots

./data:
00  41  86  ad  ae  cf

./data/00:
0016e73b5df0434ddb2642ca4b66378b1bc9ba7d903721b023ece405dbb7047a

./data/41:
41607a0f83dd1d390189fe72059f4808dc633ed865b869a1c64a63925161185d

./data/86:
86b25cef771f189ca565bc19c99ed6c7c9e3691ba3758079179a600c58e06f88

./data/ad:
ada0c8097a969bf9d126b39d84da8bfbf5dc428a354c31151c3e1cc86e742fce

./data/ae:
ae6b17714a70a19870c998ce0e9a28734f2ea0f471aec6832791dc8ce0ffaa69

./data/cf:
cfd2ce39624aa9a12d3656deecbf5ecc02144fc277b84e0b1e30669527e80d62

./index:
474bc50b2c62b12fc3ac8d7127c9583a6468f6e652bab263c8b78e14896f553f
a1243cf997162370e57ad2589f605271922748b20f8800b8090cfb1752b52285
a75ea15927343280ab9a4dfdbf6615f6c1489dc89c132f334f8f866159e0ac1b
e83ccb66f95576b834669f31fb7f8e69059a4e343eacba679c6941fba66df4a3

./keys:
f588b797f23be312426aecfe2d13473f4e0d86ee60cde3c5d6a30cf6b6259e44

./locks:

./snapshots:
104598c8811d078906e5f1d13044e83d826df034cfb3a1c86a31ac7b317d7876
61778ecfa0b209c6eb95a9c07a5058cf80bddb28e427a42e7f6495e377b06d61
74c189425790faf2497ddface2a041ba7fbb883223fa8c93235b8d1b3d7f5d2c
7d9cf8d381ff6651d5761f9fd8d15566d142e70d827380500b47708eaa4840c3

Commands like restic forget or restic prune can’t be used in that state and therefore we are most likely unable to recover from that error without deleting that whole repository.

If you need any more input, please just ask. If really needed we could also share one of those (corrupt) repositories including the PASSWORD file and also the source data of the volume. But that should be the last resort.

Thank you very much in advance for your help.

Bernhard

Let me start by explaining how keyfiles work in restic. A keyfile in restic is encrypted with a user-specified password and contains the actual masterkey used to encrypt data in the repository. A lost or damaged keyfile can make the repository content inaccessible. ciphertext verification failed errors can occur if a file is damaged or was encrypted with a different key.

The specific pattern of ciphertext verification failed errors complain about index files, but not about the config file, which is loaded first. This means that the config file is encrypted using the “new” keyfile, whereas all other files in the repository still use the “old” key. According to the file listing only a single key exists. This situation can only arise in the following case: first delete the config file, delete the old keyfile (can also happen sooner or later) and finally run restic init.

Based on your above description it is not possible to get into this state unless those files were delete by some external means. There’s recently been a nearly identical issue Restic repository corrupted with error: ciphertext verification failed · Issue #4848 · restic/restic · GitHub which was caused by lifecycle rules that delete the config and the key files.

As the repository apparently no longer contains the old keyfile, it is lost irrecoverably.

1 Like

Hello Michael,

thank you very much for your response. Today another Strapi repo (this time a production one) failed.
There was no deployment that could have lead to a configuration change. Also the restic sidecar container did run continuously. Therefore a change of the encryption key is impossible for that specific case.

I also checked for Lifecycle rules on all of our S3 buckets that store restic backups - none of them has a rule configured and most providers don’t even support configuring them. Therefore I think I can rule that out.

What I can’t rule out at the moment is some read error from the S3 and therefore an accidental reinitialization of the repository (most likely caused by our own code, see below).

This is the log output of our backup that failed production repository:
still runs ok:

Pushing start metrics to prometheus-pushgateway...                                                                                          
Starting backup...                                                                                                                          
using parent snapshot e085c0df                                                                                                              
                                                                                                                                            
Files:           0 new,     1 changed,    79 unmodified                                                                                     
Dirs:            0 new,     1 changed,     3 unmodified                                                                                     
Added to the repository: 1.009 MiB (67.221 KiB stored)                                                                                      
                                                                                                                                            
processed 80 files, 14.839 MiB in 0:01                                                                                                      
snapshot 21113c37 saved                                                                                                                     
Pushing end metrics to prometheus-pushgateway...                                                                                            
INFO: BACKUP_JOB_NAMESPACE set via ServiceAccount                                                                                           
Pushing start metrics to prometheus-pushgateway... 

next run:

Pushing start metrics to prometheus-pushgateway...                                                                                             
Starting backup...                                                                                                                             
created restic repository afcb4236eb at s3:https://s3.de.cloud.ovh.net/production-.../strapi-backups             
                                                                                                                                               
Please note that knowledge of your password is required to access                                                                              
the repository. Losing your password means that your data is                                                                                   
irrecoverably lost.                                                                                                                            
Error loading snapshot 759a09bb: failed to load snapshot 759a09bb: ciphertext verification failed                                              
github.com/restic/restic/internal/restic.(*SnapshotFilter).findLatest.func1                                                                    
    /restic/internal/restic/snapshot_find.go:56                                                                                                
github.com/restic/restic/internal/restic.ForAllSnapshots.func1                                                                                 
    /restic/internal/restic/snapshot.go:94                                                                                                     
github.com/restic/restic/internal/restic.ParallelList.func2                                                                                    
    /restic/internal/restic/parallel.go:45                                                                                                     
golang.org/x/sync/errgroup.(*Group).Go.func1                                                                                                   
    /home/build/go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75                                                                    
runtime.goexit                                                                                                                                 
    /usr/local/go/src/runtime/asm_amd64.s:1650                                                                                                 
Pushing end metrics to prometheus-pushgateway...

So it seems our backup code could contain some issue. This is the backup part that backs up the data:

RESTIC_BIN=/usr/local/bin/restic

init_repo() {
  if ! "$RESTIC_BIN" list snapshots > /dev/null 2>&1; then
    "$RESTIC_BIN" \
        init
  fi
}

backup_to_repo() {
  echo "Starting backup..."
  init_repo
  "$RESTIC_BIN" \
        backup \
        --host "$RESTIC_BACKUP_NAME" \
        --exclude-caches \
        --exclude-if-present ".NO_BACKUP" \
        /backup
}

backup_to_repo

As you can see we always call restic list snapshots before backup to check whether the repository is already initialized - if not we call restic init.

Do we maybe need a more robust way to check for an already initialized repository?

Bernhard

It’s probably better to either use restic list keys or restic cat config > /dev/null (the latter is suggested at Scripting — restic 0.16.4 documentation ). Without the output from stderr it’s unfortunately impossible to tell what went wrong exactly. However, the exact command used shouldn’t make any difference.

restic init by itself also verifies whether the repository already contains a config file. So apparently the backend reported that no config file exists in the repository. Any other error reported while checking the config file’s existence causes restic init to fail. In restic 0.17.0 (still takes a few weeks) there will be an additional check whether any key file exists: init: double check that no repository exists yet by MichaelEischer · Pull Request #4853 · restic/restic · GitHub . However, it’s very problematic if the storage backend sometimes silently (!) fails to list some files, especially if those files have existed for a long time.

In short, restic init doesn’t overwrite a repository if the repository already contains a config file. That is, unless the backend erroneously claims that the file does not exist.

How many files does the keys folder contain? restic init never deletes keys (there’s simply no code for that).

That error is the exact same situation as before. The config and at least one key file have been replaced.

While not ideal, it shouldn’t make a difference as restic init also checks that the repository is not yet initialized.

I wonder whether this could be related to a maintenance at OVH: Public Cloud Status - [GLOBAL][Storage] - Object Storage maintenance notification and Public Cloud Status - [GLOBAL][Storage] - Object Storage maintenance notification

2 Likes

Hello Michael,

thank you again for your enlightening answers. It helped us to find out the real reason for at least two cases.

I found at least a reason for a lost config file in two of our environments - and it’s stupid. It seems our developers used the S3 buckets for two different backup type and one of them can delete files older than 7 days. Guess which file was also deleted by that other backup type? :laughing:

Thank you for your insights about the config file and the init process.

For two other occurrences I still can’t explain the real reasons. But I now know that I can trust restic as it can’t delete the config file at all. The OVH issues you mentioned could be relevant as both cases indeed happened on OVH S3 object storage. But I fear we won’t get a definitive answer on that.

In the meantime I also reprogrammed our init repo function to use restic cat config to match the documentation you mentioned. I am pretty sure this documentation was not there when we started our experiments with restic (somewhere around 0.13.x).

Maybe the documentation or the output of restic could be improved. Without your input I am sure I would still search for a reason. I imagine something like “config key does not match index/snapshot/packs key” or a function to check for that specific case.

Thank you again for your valuable input.

Bernhard GrĂĽn

The ciphertext verification failed errors have for several years been nearly universally caused by actual data corruption either on storage or while processing the data. From the perspective of restic a mismatching key just causes the exact same error as corrupted data, so there’s no easy way to distinguish both cases. I’ll have to think about how we can reliably detect it.

Github issue: Detect mismatch between masterkey and remaining repository data · Issue #4862 · restic/restic · GitHub

1 Like

I’ve also adjusted repository: prevent initialization if a snapshot exists by MichaelEischer · Pull Request #4863 · restic/restic · GitHub such that it would have prevented running restic init for those repositories. That would keep the repositories in the state in which they have no config and key files, which is hopefully much easier to debug.

Hey,

Wouldn’t something like a checksum / fingerprint of the key embedded into the data solve that problem. That could reduce the cryptographic complexity though. Maybe an embedded key id without any relation to the key itself would be a lot safer.

This actually sounds like a good enhancement and surely also helps debugging.

Thank you for your work!

There is no such information stored right now and adding it would require changing the format of every file type which is a massive undertaking (not worth the trouble just for this issue).

1 Like