Restic backup size vs original

Hi,

I’m using restic for a while now but I can’t seem to figure out a few things.
First of all, how does a snapshot compare to a previous one. Meaning: I have 2 files a and b. restic made a snapshot and then I altered file b. restic makes a new snapshot. How do these snapshots differ?

Secondly, my main backup folder is ± 300gb, but the restic folder only about 43gb. How come? Are my snapshots then incomplete?

The code that generates my snapshots:

export RESTIC_REPOSITORY="/backups_restic"
export RESTIC_PASSWORD="xxxxx"

FOLDER = /files
RESTIC_REPO="/backups_restic"
DB_HOST="localhost"
DB_USER="xxxxx"
DB_PASS="xxxxx"
DB_DB1="1"
DB_DB2="2"
DB_DB3="3"

# abort entire script if any command fails
set -e

# clean up backup dir
restic forget -r $RESTIC_REPO --keep-within=4d

mysqldump --single-transaction -h $DB_HOST -u $DB_USER -p$DB_PASS $DB_DB1 > db_mysql_1.sql
mysqldump --single-transaction -h $DB_HOST -u $DB_USER -p$DB_PASS $DB_DB2 > db_mysql_2.sql
mysqldump --single-transaction -h $DB_HOST -u $DB_USER -p$DB_PASS $DB_DB3 > db_mysql_3b.sql
restic -r $RESTIC_REPO backup db_mysql_1.sql
restic -r $RESTIC_REPO backup db_mysql_2.sql
restic -r $RESTIC_REPO backup db_mysql_3.sql

rm db_mysql_1.sql
rm db_mysql_2.sql
rm db_mysql_3.sql

# backup the data dir
restic -r $RESTIC_REPO backup $FOLDER

# delete trap
trap "" EXIT

# clean up backup dir
#restic forget -r $RESTIC_REPO --keep-within=3d

As one can see, I use the keep-within=4d so only within 4 days are stored. What does this mean for my snapshots if, for example, file a is not altered for 5 days. Is it still backed up?

1 Like

The first snapshot will contain all of the files’ data, and some metadata. The second snapshot will contain (simply described) only the parts that changed in file B, and some metadata. This is due to the fact that restic deduplicates what you ask it to back up, such that one piece of data is only stored once, but can be referenced by multiple snapshots. So only “unique” data is stored, but snapshots work such that they always reference all of what you asked restic to back up at the time.

Assuming you didn’t get any warnings about files being unreadable when you backed up, and you got a 0 exit code from restic when running the backup command, then restic should indeed have backed up all you asked it to and nothing should be incomplete.

The fact that there’s less space used in the repository than in your original data set can be due to different things, for example deduplication that I mentioned above, but also due to unreadable files in your folder. You pretty much have to look at the output from restic when you run it to know if theres anything out of the ordinary. But it’s certainly not uncommon to have less disk usage in your repo than you have in your source data set.

Yes - as long as you haven’t deleted the last snapshot that references that file (i.e. where the file was included in the backup for that snapshot), it will be there. I encourage you to simply try it though, by restoring a file or two, so you can see it for yourself.

2 Likes

I would just like to add that deduplication is done on chunk level and already at the very first backup. so if you have a lot of same or similar files, already the first snapshot will be smaller than the source.

Still, the difference in your case is quite big. I agree with @rawtaz , do a test restore or look around using “restic mount”.

1 Like

thanks to the both of you! helps a lot :slight_smile:

hello, i’d like to suggest someting related to the difference between repository size and original data size.

Next to the already good advise on investigation of exit codes, warning/errors during backup, or trying a full restore, you can have restic show information about your data with the stats command.

Here some examples for you:

restic -r $RESTIC_REPO stats --mode restore-size latest
restic -r $RESTIC_REPO stats --mode raw-data

Check the documentation or help for details. In case that all data is in the backup repository I would expect:

  • the “restore-size latest” to report ~300GB.
  • the “raw-data” to report ~43GB.

Play around with it; I hope this helps to give additional insight.

2 Likes

Wauw thanks! this gives me a lot more insight indeed!