Adventures in Compression

I’ve been playing with piping compressed data to Restic like so:

tar cf - /Applications | zstdmt -3q --rsyncable | restic backup --stdin --stdin-filename applications.tar.zst

This basically creates an “rsyncable” Zstandard tarball (to aid in deduplication) inside of a Restic snapshot. To test, I backed up the same folder without compression. I then also ran updates from the App Store to upgrade a few of my apps, to see how well it would work.

Here’s the stats!

Uncompressed:

Stats in raw-data mode:
Snapshots processed: 1
Total Blob Count: 613468
Total Size: 51.490 GiB

Stats in restore-size mode:
Snapshots processed: 1
Total File Count: 909848
Total Size: 55.700 GiB

Compressed:

Stats in raw-data mode:
Snapshots processed: 1
Total Blob Count: 21669
Total Size: 28.946 GiB

Stats in restore-size mode:
Snapshots processed: 1
Total File Count: 1
Total Size: 30.148 GiB

Then after updating some apps, here’s the results of the snapshot…

Uncompressed:

Files: 269 new, 97186 changed, 618714 unmodified
Dirs: 6 new, 3671 changed, 180007 unmodified
Added to the repo: 775.429 MiB
processed 716169 files, 58.743 GiB in 6:31

Compressed:

Files: 1 new, 0 changed, 0 unmodified
Dirs: 0 new, 0 changed, 0 unmodified
Added to the repo: 745.600 MiB
processed 1 files, 30.156 GiB in 23:15

And then updated a few more apps!

Uncompressed:

Files: 583381 new, 316 changed, 132472 unmodified
Dirs: 159033 new, 353 changed, 24298 unmodified
Added to the repo: 390.212 MiB

Compressed:

Files: 1 new, 0 changed, 0 unmodified
Dirs: 0 new, 0 changed, 0 unmodified
Added to the repo: 186.706 MiB

And finally, just running some apps and closing them, not updating anything:

Uncompressed:

Files: 0 new, 32 changed, 716428 unmodified
Dirs: 0 new, 94 changed, 183735 unmodified
Added to the repo: 187.982 KiB
processed 716460 files, 58.781 GiB in 3:37

Compressed:

Files: 0 new, 1 changed, 0 unmodified
Dirs: 0 new, 0 changed, 0 unmodified
Added to the repo: 35.654 MiB
processed 1 files, 30.320 GiB in 22:00

Final thoughts… on the storage end, it seems to be efficient enough. However, it drastically increased the backup time, which I was surprised by considering I used Zstandard.

One benefit of this experiment is that I figured out how to remotely back up my VPS that has too little RAM to run Restic locally:

ssh remote-vps ‘tar -cPf - /home/user | zstd -3q --rsyncable’ | restic backup -H my-vps --stdin --stdin-filename my-vps.tar.zst

I’m mostly just backing up a few config scripts and binaries, which adds up to about 12MB of data. So this only takes about 12 seconds for me. Nifty!

Notes:

Don’t change the compression after making your first snapshot - obviously that will change a vast majority of the archive, and not let restic dedupe. We want the archive to “remain the same” as much as possible, which --rsyncable does, but not if the compression level changes (it will for others going forward, but yeah).

Thoughts:

I wonder if messing with –stream-size or –long (set to a smaller than usual value) would help Restic deduping? There’s probably some way to set the Zstandard chunk size to the Restic chunk size, but I’m not exactly sure how to do it - or if that would negate the benefits of compression.

Overall, I don’t think this is a viable method for large datasets. It’s too slow, and small changes aren’t stored as efficiently as with Restic alone. But it makes backing up my VPS a cinch! :man_shrugging:

2 Likes

Duude you did some digging there :slight_smile: I don’t fully understand everything you say but just wanted to note that I find the idea very interesting to pipe a tar “stream” to restic via ssh. I didn’t even think of that so far and it kind of solves a request that I’ve often come across here: running restic on a central backup server and not on the backupees.

And a question: why chunk up the tarball at all? restic dedupes even parts of files so wouldn’t it be best to let restic do the chunking?

And another question: what happens if you delete a large part of your /Applications? I’d imagine the tarball will look very differently then.

2 Likes

I do that by mounting all my raspis with sshs like in

sshfs smallPC /smallPC

and then I let restic backup /smallPC

1 Like