Why is the repo 300% larger than the source folder?

root@store50a zpool0]# restic stats -r restic_repo --mode raw-data
repository cc56b432 opened successfully, password is correct
scanning…
Stats in raw-data mode:
Snapshots processed: 94
Total Blob Count: 7716191
Total Size: 3.737 TiB
[root@store50a zpool0]# restic -r restic_repo stats --mode restore-size
repository cc56b432 opened successfully, password is correct
scanning…
Stats in restore-size mode:
Snapshots processed: 94
Total File Count: 1149713
Total Size: 4.079 TiB
[root@store50a zpool0]# du -hs restic_repo
3.8T restic_repo

@forbin, can you verify if deduplication and/or compression is enabled on the zfs pool on store50b ?

zpool list

zfs get all zpool0 | grep compress

Both zfs features would cause du to show you incorrect source data sizes.

Compression is enabled on both the source server and the destination server.

Source:

[root@store50a ~]# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zpool0 17.5T 5.50T 12.0T - - 18% 31% 1.00x ONLINE -
[root@store50a ~]# zfs get all|grep compress
zpool0 compressratio 1.42x -
zpool0 compression lz4 local
zpool0 refcompressratio 1.42x -

Destination:

[root@store50b ~]# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zpool0 17.5T 12.8T 4.63T - - 14% 73% 1.00x ONLINE -
[root@store50b ~]# zfs get all zpool0 | grep compress
zpool0 compressratio 1.90x -
zpool0 compression lz4 local
zpool0 refcompressratio 1.90x -
[root@store50b ~]#

The compression ratio is a bit lower on the source than on the destination.

@saviodsouza , @GuitarBilly

Just making sure to tag you so you see the replies.

@rawtaz Tagging you as well.

I guess the main reason for the size difference is that restic doesn’t compress it’s repositories yet (see Implement compression support by MichaelEischer · Pull Request #3666 · restic/restic · GitHub ). What size does du -hs --apparent-size print?

Sorry, got caught up yesterday and was away today. I really can’t add anything of value that others haven’t already said. I think that if restic saves X amount of data, it’s because it actually was served/read that amount of data from the filesystem. If the filesystem reports that there’s less data than restic read, there has to be something going on with your ZFS, and compression sounds like a good idea to look at. Are you also deduplicating in your ZFS? Maybe you could post the complete log file for the backup run, just so people have 100% of that information available.

@MichaelEischer Here’s what it says…

[root@store50a zpool0]# du -hs --apparent-size restic_repo
3.9T restic_repo

I don’t see how the lack of repo compression could be a factor, though. Even without compression, the repo size should not exceed the size of the source data (maybe a little allowing for some metadata, but not by 300%).

Blockquote If the filesystem reports that there’s less data than restic read, there has to be something going on with your ZFS

The problem is that restic seems to be writing three times more data than it is reading.

forbin,
i would say you have not yet proven that restic is writing 3x the amount of data that it reads.
Bear in mind that tools like du get easily fooled by advanced filesystems like zfs.

I am no expert in zfs but e.g. with compression du on zfs shows the size after compression, not the orignal size… Also setting different blocksize for zfs could cause du to report wrong sizes.

If restic is reading and then encrypts the source data, I can imagine that at your destination server the zfs compression of the restic repo files is not effective anymore.

Are you worried abour the (reported) space of source vs destination, or the integrity of your backup?

Not if the root cause is that the data is of the size restic stores, but on your ZFS for various reasons it’s using less disk space. Presumably ZFS reports the actually used disk space, so if you store a file that’s 3 GB and it only takes up 1 GB on disk, you have this type of situation.

The problem is that there is 1 TB if source data, and after a few consecutive days of backups (maybe 3 or 4 days) it completely fill the 17 TB of storage on the destination server.

The issue I’m struggling with is there is 1 TB if source data, and after a few consecutive days of backups (maybe 3 or 4 days) it completely fill the 17 TB of storage on the destination server.

Yeah, totally with you that it’s a problem :slight_smile: We should try to figure it out.

Can you please post the output of zfs list -tall | grep zpool0/db_rsyncs (or similar so that we can see all the ZFS filesystems that are related to the db_syncs directory)?

Restic will only grow with new or changed data so the amount that will be added depends on how static / dynamic the source data is.

You now anticipate on ~100% new or changed data every day, that seems a bit wild?

Restic has backuped 4.079 TiB of data and saved 3.737 TiB in the repository.

The problem is not that restic saves more in the repo than it backups, but the mismatch between the 1.1TiB you are reporting and the 4.079 TiB which restic obviously did read. This mismatch, however, does not depend on restic, but on zfs and how you use it to get this information.

@alexweiss I understand your logic, but the problem is that the source zfs filesystem is only showing a 1.4x compression ratio, whereas the destination is showing a 1.9x compression ratio. Assuming the actual data size is the same on both, the apparent size on the destination should be less than the apparent size on the source.

@GuitarBilly Normally, the daily change rate of the data would be fractional, like maybe 1%, However, I have a theory about what may be happening that I will check into.

@rawtaz

db_rsyncs is not its own filesystem, just a sub-folder under /zpool, so I don’t know if this will tell you much…

On the source server, STORE50B

[root@store50b zpool0]# zfs list -tall
NAME USED AVAIL REFER MOUNTPOINT
zpool0 10.9T 3.50T 10.9T /zpool0

On the dest server, STORE50A

[root@store50a ~]# zfs list -tall
NAME USED AVAIL REFER MOUNTPOINT
zpool0 4.86T 9.50T 4.86T /zpool0

Let’s stand down for a couple of days on this while I monitor the behavior some more and explore a possible theory I have about what’s happening.

1 Like

I think you might find better help about this in a zfs forum or on some other zfs platform.

If you are uncertain about the real size of the files, please try

find . -type f -exec cat {} + | wc -c