Comparison with Borg / Shortcomings in Restic?

Hi manfredlotz,
Here are our situation: hundred of s3 buckets where hundred of restic save backup data. Some restic clients have one snapshot every 15 minutes and our customers keep them for months.
This buckets have up to 10 or 15TB of data.
Some minio servers provides s3 buckets and they backends are always zfs (via nfs) with compression enabled.
The average zfs compression on zpools that we see is 2%

From my point of view, restic doesn’t need compression

By
Michele

Ah, that’s not quite right: what you’re seeing is that data in the restic repository (stored on zfs) does not compress very well. That’s not surprising: all data in the repo is encrypted and therefore has high entropy. Compression within restic would mean compressing files before encryption/deduplication. :slight_smile:

3 Likes

Yes! You are right! My (big) fault. I forgot the encryption!
So… could you implement the compression? :smile:

Thanks,
Michele

2 Likes

Implementation is in the backlog (mentioned in the top-post: https://github.com/restic/restic/issues/21). There’s been quite a lot of discussion there.

Oh, and please please please don’t comment in that issue, it’s way too long already :wink:

2 Likes

@mic_p

I had backup-ed my laptop data using borg and got

------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
All archives:                4.37 TB              3.70 TB              1.06 TB

                       Unique chunks         Total chunks
Chunk index:                 4301812             20884547

So my interpretation was that compression saved me 670GB which is quite something.

Based on this output, I think that it saved you about a quarter of that; you need to consider the deduplicated size as well. Note that 2.7TB of compressed data was deduplicated. The benefits of compression go down as the benefits of deduplication go up.

For example, let’s say you have 500 copies of the same 1GB file, and that file compresses to 250MB.
The original data set, when compressed, goes from 500GB to 125GB. Once it is deduplicated, it drops further to 250MB.

It’s tempting to say that compression saved you 375GB, but it did not – deduplication alone would have reduced the 500GB data set to 1GB.

Adding compression to deduplication in this scenario only saves 750MB, nowhere near 375GB.

1 Like

I recently moved from Borg to Restic. One major drawback with Borg is if you had a moderate sized repository (3GB of files in /var/www) Borg doesn’t do incremental backups well - it copies a LOT of data to a new file in the backup location. If you’re storing your backup locally that’s ok, but if you’re backup up to cloud that’s going to use a lot of bandwidth. Restic only backs up what is required, so is much more bandwidth efficient.

I run Restic on an AWS t2.nano that right now has 40MB physical RAM available (but with a lot of RAM used as cache) and 400MB virtual memory free (t2.nano has 512MB RAM). It backups a few GB nightly, 83000 files. CloudWatch says my RAM goes from 37% to 53% during backups, for about 1 minute. Resource usage for this amount of data seems really reasonable.

1 Like

Unfortunately those stats aren’t worth anything without knowledge about the dataset, number of snapshots etc. Take a look at the stats of 1 of my respoitories.

# restic stats --mode restore-size
repository 21006ba7 opened successfully, password is correct
scanning...
Stats for all snapshots in restore-size mode:
  Total File Count:   43591952
        Total Size:   4.492 TiB

# restic stats --mode raw-data
repository 21006ba7 opened successfully, password is correct
scanning...
Stats for all snapshots in raw-data mode:
  Total Blob Count:   692615
        Total Size:   76.261 GiB

Those stats are really impressive but the repository contains lots of snapshots with mostly static data.

What would be really great is a comparison of restic and Borg with real life data to see if compression is worth the effort. But as Borg is restricted to SSH backends I haven’t bothered to use it in a while.

First of all, the data on my laptop is real life data.

I admit that my interpretation of the savings may be wrong.

So, I just started a backup of all my btrfs subvols on my laptop using restic. Then I will do the same using borg. Let’s see how much the savings are.

I will report.

My laptop has 2 1TB SSDs with 20 btrfs subvols and two other partitions to backup, namely /boot and /boot/efi which are small.

The first SSD has 708GB in use, the second one has 720GB in use.

I also measure how long it take (I assume that here restic is far better than borg). RAM is 16 GB.

Ok, I did a backup with both restic and borg.

Versions used

  • restic: 0.9.5 compiled with go1.12.2 on linux/amd64
  • borg: 1.1.9

Backup size

I forgot two subvols but backup-ed 20 btrfs subvols and /boot and /boot/efi. Size of data to backup was: 1259GiB

I should add that both repositories are on an external USB HD.

Elapsed time

  • restic: 5h:21m:21sec
  • borg: 7h:27m:28secs

Size of repositories

  • restic: 1054GiB
  • borg: 950GiB

So we have a savings of approx. 104GiB.

I will do a second round of backup to see how the elapsed times will be.

Second run

  • restic: 10m:01.86s
  • borg: stopped after 1.5h. It seems there is a bug and it is re-adding all files
2 Likes

That’s awesome, restic is by far faster indeed. I noticed that not that long ago because I use both programs and Borg is taking longer to save changes (note that I use restic with sftp backend and borg is saving data directly to an external HDD and still restic is much faster). That besides the fact that you don’t need a server-side setup for restic made me change from Borg to restic completely.

The only downside with restic is compression but I assume that compression will slow down restic too because it has to open compressed archives and then do the whole operation.

Actually, I’m currently evaluating which backup software might be best for me to use. I did look only at borg and restic. Others, I didn’t investigate further. For example, duplicati is a mono based application which is a nogo for me.

  • No compression is a drawback for restic. But my test showed that without compression it is not as bad as I had assumed previously.
  • Being such fast comes with goroutines. This is really nice, and if restic is too wild using the resources there are possibilities (nice, ionice,…).
  • I don’t think that compression would slow down restic so much because there are lightweight compression methods which are pretty fast.

Although not ideal (but what is ideal in a world of duality?) for me restic is the winner of the game.

1 Like

Regarding compression: a software could decide to not compress certain file types because it gives not much savings if at all (zip, gz, xz etc).

2 Likes

I don’t think deciding by file type would be the right way. Restic saves data in chunks and a distinct chunk can belong to different files - and even files of different type.

Compression should be at chunk level and (without knowing the code, my knowledge of Go is very limited) it should be easy to implement: when writing a chunk, Restic tries to compress it, if it doesn’t shrink, it will write it uncompressed. A single flag for the compression type (none, lz4, …) in each chunk header would be sufficient. Hash values would be always for the uncompressed chunk, therefore nothing else in the code path has to be changed, only reading/writing chunks are affected.

Perhaps you are right.

On the other hand I am sure that a chunk (compressed or not compressed) has a checksum (i.e. the checksum of the raw data). When restic encounters a zip file and decides not to compress then it could check if the checksums of the zip file’s chunks are already existing. If a chunk is alread existing then it doesn’t matter if the existing chunk was compressed or not. The checksum is the criteria.

To update from thses points,
How much has changed ?

I don’t know but we must check too,how fast is the restore, and how fast is the prune and check, because is part of every backup. and restic is very slow at least on my machines…

For the record, most of the shortcomings mentioned here have been addressed: the code for the archiver and restic prune was rewritten from scratch and we’ve just merged compression support to master, see

6 Likes