Stats command vs actual space usage

francisco1844 · June 16, 2022, 5:11pm

Trying to understand what stats command reports. Is it total, un-deduped and uncompressed, space?

For example this output

Stats for all snapshots in restore-size mode:
Total File Count: 3062554
Total Size: 310.345 GiB

Vs actual usage in SFTP server

du -hs
7.2G

akrabu · June 16, 2022, 7:37pm

Getting information about repository data

Use the stats command to count up stats about the data in the repository. There are different counting modes available using the --mode flag, depending on what you want to calculate. The default is the restore size, or the size required to restore the files:

restore-size (default) counts the size of the restored files.
files-by-contents counts the total size of unique files as given by their contents. This can be useful since a file is considered unique only if it has unique contents. Keep in mind that a small change to a large file (even when the file name/path hasn’t changed) will cause them to look like different files, thus essentially causing the whole size of the file to be counted twice.
raw-data counts the size of the blobs in the repository, regardless of how many files reference them. This tells you how much restic has reduced all your original data down to (either for a single snapshot or across all your backups), and compared to the size given by the restore-size mode, can tell you how much deduplication is helping you.
blobs-per-file is kind of a mix between files-by-contents and raw-data modes; it is useful for knowing how much value your backup is providing you in terms of unique data stored by file. Like files-by-contents, it is resilient to file renames/moves. Unlike files-by-contents, it does not balloon to high values when large files have small edits, as long as the file path stayed the same. Unlike raw-data, this mode DOES consider how many files point to each blob such that the more files a blob is referenced by, the more it counts toward the size.

https://restic.readthedocs.io/en/latest/manual_rest.html

akrabu · June 16, 2022, 7:38pm

So restore-size is everything deduplicated and restored. If you ran stats on the whole repo instead of an individual snapshots, and you have, say, 10 snapshots, which are mostly deduplicated, it will be 10x as big, at minimum, because it’s as if you’re restoring ALL ten snapshots.

It may be of more use to run restic stats in restore mode on a specific snapshot - or just on the latest snapshot of each host.

francisco1844 · June 16, 2022, 8:45pm

Thanks. I think single snapshot and –mode raw-data are the two that would be most useful to me.