Understanding snapshot summaries and stats ouput

I am updating my statistics evaluation to the latest restic 0.17.1 and trying to make sense of the output of restic snapshots --json. I have the following questions:

  1. (snapshots) data_blobs is rather small, much smaller than the (stats raw) total_blob_count of the last snapshot. Is this the number of blobs added by this snapshot?
  2. Same for (snapshots) tree_blobs, is it tree blobs added?
  3. Should tree blobs be correlated with files / dirs new / changed? E.g. tree_blobs = files_new + files_changed + dirs_new + dirs_changed?
  4. (snapshots) total_bytes_processed is consistent with (stats restore) total_size.
  5. Why is the summary of a backup command different from a snapshots command? (It would be easier if the “Summary object” would be a heading and could be a link target)

No, in both cases it’s the total number of blobs in the snapshot. See Scripting — restic 0.17.1 documentation for slightly more documentation.

tree_blobs should be roughly equivalent to dirs_new + dirs_changed + dirs_unmodified.

Where is the question?

Essentially for historical reasons. Do you have specific fields in mind?

Thanks @MichaelEischer for the reply.

No, in both cases it’s the total number of blobs in the snapshot.

I’ve read the Scripting section of the documentation. But can’t understand the numbers. Here is the “summary” part of a snapshot (restic snapshots --json abcd1234):

    "summary": {
      "backup_start": "2024-10-18T...",
      "backup_end": "2024-10-18T...",
      "files_new": 5,
      "files_changed": 43,
      "files_unmodified": 233347,
      "dirs_new": 1,
      "dirs_changed": 48,
      "dirs_unmodified": 24382,
      "data_blobs": 89,
      "tree_blobs": 48,
      "data_added": 100216326,
      "data_added_packed": 17808787,
      "total_files_processed": 233395,
      "total_bytes_processed": 54711800386
    },

So there are 89 data blobs.

With the command restic stats abcd1234 --json --mode raw-data I get this output:

{
  "total_size": 37253339317,
  "total_uncompressed_size": 45095159709,
  "compression_ratio": 1.2104997977569625,
  "compression_progress": 100,
  "compression_space_saving": 17.389494665510508,
  "total_blob_count": 218008,
  "snapshots_count": 1
}

So in this very same snapshot there are now 218008 blobs. Why are those numbers so different? How can they have the same meaning?

tree_blobs should be roughly equivalent to dirs_new + dirs_changed + dirs_unmodified.

But it is not: tree_blobs is 48 and dirs_new + dirs_changed + dirs_unmodified = 24431.

Essentially for historical reasons. Do you have specific fields in mind?

If the backup summary would contain backup_start and backup_end instead of total_duration, they would be essentially the same (except for the snapshot ID).

I’m looking forward to clarifying those numbers.

I just had another look at the code and those two fields currently only report the number of new blobs added to the repository. So your initial assumption was correct, sorry for the confusion.

So tree_blobs is related to dirs_changed + dirs_new. And data_blobs is related to files_new + files_changed. However, there is no 1:1 mapping between those. Adding multiple empty directories increases dirs_changed or dirs_new but likely won’t add new tree blobs for those directories (only the parent directories)

Thanks, that fits now.

Would it make sense to rename data_blobs to data_blobs_added and tree_blobs to tree_blobs_added similar to data_added? Or is backwards compatibility important and this would potentially break scripts?

Either way, I would suggest to change the documentation. Should I open a PR?

I’d prefer to not break backwards compatibility here. Those names have been used for years by the backup command’s JSON output so, we probably should stick with them.

A PR to change the docs would be great :slight_smile: .

1 Like

Here is a small PR #5105.

1 Like

I’ve opened a PR to add backup_start and backup_end to the JSON output of the backup command: backup: include start and end time in json output by MichaelEischer · Pull Request #5119 · restic/restic · GitHub

1 Like