Seeking Advice on Determining Restoration Size

Hi everyone, I have a question. How can I determine the size required for restoration? It appears that restore-size or raw-data aren’t giving me what I expected, with neither of them coming close to the 624K - the actual size of the data after restoration.

> sudo restic -r /mnt/user-data/9524/u-9524 --json stats --mode restore-size latest | jq
{
  "total_size": 264247,
  "total_file_count": 106,
  "snapshots_count": 1
}

> sudo restic -r /mnt/user-data/9524/u-9524 --json stats --mode raw-data latest | jq
{
  "total_size": 31317,
  "total_uncompressed_size": 270825,
  "compression_ratio": 8.647858990324744,
  "compression_progress": 100,
  "compression_space_saving": 88.43644419828303,
  "total_blob_count": 77,
  "snapshots_count": 1
}

> sudo rm -rf restored // <--- `restored` folder does not exist before restore

> sudo restic -r /mnt/user-data/9524/u-9524 --json restore latest --target ./restored | jq
{
  "message_type": "summary",
  "total_files": 106,
  "files_restored": 106,
  "total_bytes": 264247,
  "bytes_restored": 264247
}

> sudo du -sh *
624K	restored
1 Like

Probably not you’re looking for, but I am recording the “total_bytes_processed” from json output of backup command itself.

1 Like

My second option would be to measure the size of the data before or right after the snapshot and then store it in the repository. I’m hoping to do this with tags or some other effective method. Do you think I’d be able to create my own files in a repository folder?

1 Like

If this helps, the snapshot command will show the snapshot sizes after 0.17.0, for new snapshots. See this post.

1 Like

What is 624K in units? bytes? I’m wondering whether you might see a discrepancy between actual file size (reported by stats) and disk size (rounded up to disk sector size).

1 Like

It’s 514160 bytes.

> sudo du -sb *
514160	restored
1 Like

It seems to be safe to place extra files in the repository. I recently added a folder to mine for different reason.

For keeping snapshot-bound data, tags would be best imo. But I am hopeful from the latest version. Could you try compiling the latest master and check if snapshots output shows the correct size? :eyes:

1 Like

Which restic version are you using?

I’ve just created a backup of the restic folder, ran stats and restored it (using the latest restic code):

❯ restic backup ../restic              
repository a14e5863 opened (version 2, compression level auto)
using parent snapshot 7c57e723
[0:00] 100.00%  1 / 1 index files loaded

Files:        1067 new,   580 changed,  2667 unmodified
Dirs:           35 new,   347 changed,   147 unmodified
Added to the repository: 33.117 MiB (12.829 MiB stored)

processed 4314 files, 141.167 MiB in 0:00
snapshot 75fe958f saved

❯ restic stats latest  --json    
{"total_size":148024249,"total_file_count":4843,"snapshots_count":1}

❯ restic restore -t restored latest             
repository a14e5863 opened (version 2, compression level auto)
[0:00] 100.00%  2 / 2 index files loaded
restoring snapshot 75fe958f [...]
Summary: Restored 4843 files/dirs (141.167 MiB) in 0:00

❯ du -sb restored      
148024249	restored

❯ python3
>>> 148024249/1024/1024
141.16692447662354

For me the size calculation of the stats command is a perfect match down to the last byte. Which restic version are you using? On which filesystem do you restore the data? Does the dataset contain hardlinks or other magic? Can you try a different set of files to see how those behave?

1 Like

I’m using the latest release version - restic 0.16.4 compiled with go1.21.6 on linux/amd64

I’ll try the current master bust in case.

On which filesystem do you restore the data?
Using ext4

hardlinks or other magic
No, nothing like that.

I’ve tried to backup/restore the different data:

> mkdir test-backup
> cd test-backup
> npm init -y
> npm i express jest electron

> du -sb test-backup
302779014 test-backup

> sudo restic -r /mnt/user-data/tmp-backup --json init

> sudo restic -r /mnt/user-data/tmp-backup --json backup .
{
  "message_type": "summary",
  "files_new": 5410,
  "files_changed": 0,
  "files_unmodified": 0,
  "dirs_new": 968,
  "dirs_changed": 0,
  "dirs_unmodified": 0,
  "data_blobs": 5000,
  "tree_blobs": 969,
  "data_added": 300214237,
  "total_files_processed": 5410,
  "total_bytes_processed": 298772625,
  "total_duration": 1.029439333,
  "snapshot_id": "a764eac13ba00f2a518436e9fb166c1b84a605630534c200911bbd1393d85f91"
}

> sudo restic -r /mnt/user-data/tmp-backup --json stats latest
{
  "total_size": 298772625,
  "total_file_count": 6399,
  "snapshots_count": 1
}

> sudo restic -r /mnt/user-data/tmp-backup --json restore latest --target ./tmp-backup-restore
{
  "message_type": "summary",
  "seconds_elapsed": 1,
  "total_files": 6399,
  "files_restored": 6399,
  "total_bytes": 298772625,
  "bytes_restored": 298772625
}

> du -sb tmp-backup-restore
302779014 tmp-backup-restore

It restored good, but size reported by stats is still irrelevant to the actual data size…

2 Likes

Can you restore just a single file and compare that? I’m wondering whether du somehow counts folder size, or are there maybe some xattrs or similar that are set automatically (maybe SELinux labels?)?

If you want to know the size of a certain directory of the repository you could try this:

$ ./restic_v0.16.5-811-g8e27a934d_linux_amd64 -r repo/ restore latest --target /dev/null --dry-run --json --quiet
{"message_type":"summary","seconds_elapsed":5,"total_files":89834,"files_restored":89834,"total_bytes":1421827974,"bytes_restored":1421827974}
$

Sure, with one single file the reported size if good.

root@worker-7:/home/ubuntu/data# ls -la
total 4882828
drwxr-xr-x  2 root   root         4096 Jul  9 14:07 .
drwxr-x--- 12 ubuntu ubuntu       4096 Jul  9 14:07 ..
-rw-r--r--  1 root   root   5000000000 May 13  2023 files-5GB

root@worker-7:/home/ubuntu/data# sudo restic -r /mnt/user-data/tmp-backup --json init
{"message_type":"initialized","id":"cfa7a07f4b49f9b58675d5fe43f4445c1cc9b106188d8fe9d7f353c8dc46aff3","repository":"/mnt/user-data/tmp-backup"}

root@worker-7:/home/ubuntu/data# sudo restic -r /mnt/user-data/tmp-backup --json backup --quiet .
{"message_type":"summary","files_new":1,"files_changed":0,"files_unmodified":0,"dirs_new":0,"dirs_changed":0,"dirs_unmodified":0,"data_blobs":3192,"tree_blobs":1,"data_added":5000214143,"total_files_processed":1,"total_bytes_processed":5000000000,"total_duration":19.720306707,"snapshot_id":"e5cc599fad97cefb4eb42b91da227ca2caa7cd1f75e5b8444eb8efc938089eeb"}

root@worker-7:/home/ubuntu/data# sudo restic -r /mnt/user-data/tmp-backup --json stats latest
{"total_size":5000000000,"total_file_count":1,"snapshots_count":1}

And if there is a single directory? My guess is that du somehow reports a large size if directories are involved. Which Linux distro and du --version do you use?

Single directory reports zero size of snapshot…

ls -la
total 12
drwxr-xr-x  3 root   root   4096 Jul  9 23:40 .
drwxr-x--- 13 ubuntu ubuntu 4096 Jul  9 23:39 ..
drwxr-xr-x  2 root   root   4096 Jul  9 23:40 my_dir

sudo restic -r /mnt/user-data/tmp-backup --json init | jq
{
  "message_type": "initialized",
  "id": "58404545d7e7d2db8faae0f74f49e9f6a0ccef66759c8a6fb26d4b64659b14f0",
  "repository": "/mnt/user-data/tmp-backup"
}

sudo restic -r /mnt/user-data/tmp-backup --json --quiet backup . | jq
{
  "message_type": "summary",
  "files_new": 0,
  "files_changed": 0,
  "files_unmodified": 0,
  "dirs_new": 1,
  "dirs_changed": 0,
  "dirs_unmodified": 0,
  "data_blobs": 0,
  "tree_blobs": 2,
  "data_added": 367,
  "total_files_processed": 0,
  "total_bytes_processed": 0,
  "total_duration": 0.221280204,
  "snapshot_id": "386ff7d66e5719d30b16db6ceba20e139d7a03854bb8efb86c5619f289a7ee05"
}

sudo restic -r /mnt/user-data/tmp-backup --json stats latest | jq
{
  "total_size": 0,
  "total_file_count": 1,
  "snapshots_count": 1
}

Versions:

uname -a
Linux worker-7 6.5.0-41-generic #41~22.04.2-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun  3 11:32:55 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

du --version
du (GNU coreutils) 8.32
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Torbjorn Granlund, David MacKenzie, Paul Eggert,
and Jim Meyering.

I think I’ve found the problem:

On Arch Linux (Kernel 6.8 and du version 9.5) I get the following output:

mkdir data
echo "abc" > data/test
du -sb data 
4	data

In an Ubuntu 22.04 container on the same system I get:

du -sb data
4100	data

In an Ubuntu 24.04 container I get the exact same results as on Arch.

→ The problem here is not with restic but rather a confusing size report by du in the version included in Ubuntu 22.04.

4 Likes

Thank you very much for your suggestion; it’s really great and definitely affects the amount of memory reported by the du command!

However, I encountered this issue not because I was just checking the size reported by du.

I was backing up data and then using the size reported by the stats command to create an LVM partition of the exact size needed for data restoration. I noticed that I often ran out of disk space for the restore, and it wasn’t just a few bytes but often 10% to 15%.

So, I started digging into the issue. Now, I will rely on the accuracy of the df -B1 command on my partition before backup and save the restore size in a tag for snapshots. This approach has been the most stable and is currently working for me.