Unable to rewrite snapshot ID xxxx: cannot encode "/" without loosing information

NobbZ · May 20, 2023, 11:36am

Hi, I recently found that the download folder slipt into some of my backups, and I wanted to rewrite them to not have it anymore, though the restic rewrite fails with the message in the subject.

The ID printed is not reproducible across tries and changes:

$ RESTIC_COMPRESSION=max restic rewrite --exclude='Downloads/*'
repository 46211214 opened (version 2, compression level max)

snapshot 0a877585 of [/home/nmelzer] at 2023-05-20 01:00:13.880221453 +0200 CEST)
could not load snapshots: context canceled
Fatal: unable to rewrite snapshot ID "0a877585": cannot encode tree at "/" without loosing information

kapitainsky · May 20, 2023, 12:18pm

--exclude='/path/to/Downloads'

NobbZ · May 20, 2023, 1:11pm

Tried that, and it seems as if that doesn’t work either:

$ RESTIC_COMPRESSION=max restic rewrite --exclude='/home/nmelzer/Downloads/*'
repository 46211214 opened (version 2, compression level max)

snapshot 08a5e216 of [/home/demo] at 2022-01-25 22:15:54.804224483 +0100 CET)
could not load snapshots: context canceled
Fatal: unable to rewrite snapshot ID "08a5e216": cannot encode tree at "/home/demo" without loosing information
$ RESTIC_COMPRESSION=max restic rewrite --exclude='/home/nmelzer/Downloads/*' --host mimas
repository 46211214 opened (version 2, compression level max)

snapshot 0a877585 of [/home/nmelzer] at 2023-05-20 01:00:13.880221453 +0200 CEST)
could not load snapshots: context canceled
Fatal: unable to rewrite snapshot ID "0a877585": cannot encode tree at "/" without loosing information
$ RESTIC_COMPRESSION=max restic rewrite --exclude='/home/nmelzer/Downloads' --host mimas
repository 46211214 opened (version 2, compression level max)

snapshot 0a877585 of [/home/nmelzer] at 2023-05-20 01:00:13.880221453 +0200 CEST)
could not load snapshots: context canceled
Fatal: unable to rewrite snapshot ID "0a877585": cannot encode tree at "/" without loosing information

And why isn’t in the first run the snapshot not just skipped, as it clearly does not contain any paths to rewrite…

Also, isn’t “loosing information” kind of what we expect from a rewrite?

Or might this be related to that fact that backups are actually created using rustic rather than restic and there is some metadata that restic can not properly move?

kapitainsky · May 20, 2023, 2:36pm

I did quick test.

restic rewrite --exclude=/Users/kptsky/Downloads/bikuben/

repository 2c57440d opened (version 2, compression level auto)

snapshot c5043d2e of [/Users/kptsky/Downloads] at 2023-05-20 15:26:44.415364 +0100 BST)
excluding /Users/kptsky/Downloads/bikuben
saved new snapshot f35f14a9

snapshot 8d93fc9b of [/Users/kptsky/Downloads] at 2023-05-20 15:26:44.415364 +0100 BST)
excluding /Users/kptsky/Downloads/bikuben
saved new snapshot baf89db9

And bikuben folder with all its content has been removed. I have confirmed it with restic ls

Why it does not work for you I am not sure.

I would check backup consistency.

kapitainsky · May 20, 2023, 2:45pm

Other explanation is that you use:

--exclude='/home/nmelzer/Downloads/*'

* might be a problem - if there are any links in your Downloads folder leading to content outside…

NobbZ · May 20, 2023, 2:48pm

I tried with and without the asterisk.

I am currently checking (found that another machine of mine held a stale lock for 3508h25m34.147066328s ).

This check will now take a while. After 5 minutes I’m it is just below 3% of the snapshots.

NobbZ · May 21, 2023, 8:12am

The check reported the repo being fine, just having the unreferenced packs left by rustics lockless purges.

Rewrite still not working.

kapitainsky · May 21, 2023, 9:19am

Add some test directory to your repo - then check if you can rewrite it. If no then there is repo problem, if yes then it would point to some “issue” in Downloads folder.

kapitainsky · May 21, 2023, 9:21am

We might need some restic elders help here:)

NobbZ · May 21, 2023, 2:41pm

I tried random different paths that I would not mind loosing.

I tried against latest and old snapshots, the error remains.

wnklmnn · May 23, 2023, 8:50pm

I’m running into the same issue.

The snapshots on which the rewrite fail for me seems to be a snapshots that was copied from one repository to another.

snapshot a33a1f45 of [/home/user] at 2023-04-25 17:33:42.700107197 +0200 CEST)
could not load snapshots: context canceled
Fatal: unable to rewrite snapshot ID "a33a1f45": cannot encode tree at "/" without loosing information

restic -r sftp:someRemoteRepo snapshots a33a1f45 --json
enter password for repository:
[{"time":"2023-04-25T17:33:42.700107197+02:00","tree":"984af441a9e5de8de4eb27ab9b9a39c3fe38d57c597a9aad42585d4d3f304d18","paths":["/home/user"],"hostname":"<anotherHost>","original":"1c93a52c6a8359c6ccc3b58dd1ee11d60cd1e46f4b1b69ae90f5f9198954eafb","id":"a33a1f4591ea5864398b17fad185cf77d290eca18ef717a1f7ab47e80d8d9f81","short_id":"a33a1f45"}]

edit:

Even running the command restic -r sftp:someRemoteRepo rewrite --dry-run --exclude-file /tmp/exclude a33a1f45 with an empty excludeFile fails

MichaelEischer · May 31, 2023, 5:58pm

That is likely the cause. rewrite verifies that saving the unmodified tree metadata would exactly recreate the existing metadata. That ensures that there are no unexpected changes to a snapshot during a rewrite. In restic 0.16.0 (or using a beta build) restic repair snapshots will be able to fix the problem.

MichaelEischer · May 31, 2023, 6:01pm

@wnklmnn What is the output of restic cat blob 984af441a9e5de8de4eb27ab9b9a39c3fe38d57c597a9aad42585d4d3f304d18? The problems with rewriting the snapshot show up while processing that (tree) blob.

greenanna-diana · August 10, 2024, 12:33pm

I encountered the same problem, these snapshots were backed up by rustic.

After running the command “restic repair snapshots”, it was restored normally in restic, but rustic reported an error:
error: deserializing from bytes of JSON Text failed: Error("missing field total_dirs_processed`", line: 1, column: 923)

This about 1TB repository is now damaged and unusable for rustic.

kapitainsky · August 10, 2024, 3:08pm

This is restic forum:)

Please post it in rustic issues:

alexweiss · August 12, 2024, 2:59am

Just came over this topic…

@MichaelEischer This also ensures that restic rewrite is basically never able to rewrite unmodified trees generated by another program different from restic itself. This is because of:

The specification in restic/doc/design.rst at master · restic/restic · GitHub about how tree jsons are generated is not very precise - some of the unclear points are: How to exactly encode non-unicode filenames? When to use null or omit non-existing fields? What are the exact rules to serialize things like dates? Moreover there is no “full” specification which fields are generally allowed and under which circumstances they are to be filled.
By comparing serialized trees, you also check that the order of elements in the generated json is exactly as restic produces it - this is again not specified but also against the principles of a named serialized format like JSON…

May I suggest that you either change the way how you do this check or add a note somewhere about this rewrite restriction?

MichaelEischer · August 13, 2024, 5:40pm

The deduplication relies on having a perfectly reproducible serialization of tree blobs. So some defined order is necessary, which for the restic repository format happens to be the order defined by restic.

The check is there to ensure that when using rewrite with an older restic version that is does not silently drop attributes only known by a later restic version.

alexweiss · August 13, 2024, 7:52pm

Yes, you are right about the deduplication. Wouldn’t that make a more precise documentation of the format even more valuable/necessary?
Or did restic switch philosophy from “the repo format specification is leading and restic is just an implementation” to “the implementation defines the specification”?

Yes, I know the intention of that check (and note that it also covers extensions which were added by some other tools writing the restic repository). Just wanted to suggest that the restriction coming with how it is currently implemented should be either overthought or at least documented.

By the way: The treatment of parent trees is actually quite similar. Does restic also perform this kind of check for the backup command (if all tree entries are not changed, check if the serialized tree matches the parent one and if not stop with some “you might be loosing information” error?)

MichaelEischer · August 14, 2024, 7:52pm

Feel free to help with documenting it. However, at some point the necessary level of detail in the documentation would be so high that it’s much more reasonable to just look at the code. That doesn’t make the existing documentation pointless. In particular, the high level perspective would be very hard to piece together from the code.

I don’t like the tone of that. And please don’t misquote fd0’s comment from Implement compression support by MichaelEischer · Pull Request #3666 · restic/restic · GitHub .

I don’t get why that should make sense. All that seems to achieve is causing failed backups.

It is possible to convert snapshots back into the canonical encoding using restic repair snapshots <IDs...>. I guess we can document that specific error somewhere.

alexweiss · August 16, 2024, 12:45pm

There was no tonality at all intended and also no intend to misquote anything. The only thing I was having in mind was that the specification would be the leading document and everything else is just an implementation of it. And that the fact that it could be used or re-implemented is one of the strengths of the format.
Now I have to admit that I can no longer remember how I came to this assumption. And reading your answers I could be just simply wrong. So a “you are wrong here” would be perfectly fine for me, actually.

Yes. And what the current rewrite check seems to achieves is failing rewriting snapshots
I’ll try to explain the analogy in more detail: The problem with rewriting snapshots is not that any information is lost when writing a new snapshot. The problem is that typically users do want to remove old snapshots once they are rewritten. And yes, this could lead to the situation that information in trees of the old snapshot is not reproduces during the rewrite. At least this is how I understood the intention of that check.
Now with backup using parents, it is exactly the same: backup could detect that all entries of a tree are unchanged w.r.t. the parent, so the tree (of the parent snapshot) should be also unchanged. But it could also detect that the tree it is going to generate in fact is not identical to the parent one. If the new snapshot is simply generated from the new tree, users are loosing information once the parent snapshot is removed… Additionally we also do have the deduplication argument when running backup with a parent snapshot.

TL;DR: In this situation (rewrite and backup with parent), there are 3 options how to handle new identical trees: 1) use the existing tree 2) write the new tree 3) quit if existing and new tree are not identical. IMO it is inconsequent to handle rewrite and backup differently. Also, IMO the third option is the worst one.