Comparison with Borg / Shortcomings in Restic?

dhilgarth · May 6, 2019, 10:59am

In this comparison from 1.5 years ago, several shortcomings of Restic are listed:

No compression functionality
Memory usage
Duration of incremental backups (apparently ten times that of Borg)
Duration and memory consumption of backup pruning (Hours and a lot of memory as opposed to “moments”)

My question is:
Were the observations in this comparison correct at the time of writing? And if so, have they been addressed by the restic team in the meantime?
Compression apparently not but what about the other three points?

Thanks,
Daniel

fd0 · May 6, 2019, 11:29am

Unfortunately I don’t have any idea which version of restic they used for the article. Based on the time (2017-12-12) I suspect it was the latest release 0.8.0 at the time (released on 2017-11-26).

Memory usage: We’ve improved restic a bit, but it still uses way too much memory, scaling up with the number of files in the repo (many small files -> much memory usage). I’m not sure what borg does though, but restic keeps an index (which data is stored where in the repo) in memory. We’ve plans to address that with an on-disk data structure, but I’ve only started working on this.
Duration of incremental backups: Unfortunately they do not describe their methodology, I suspect that something went wrong for the incremental backups. In my experience, it’s unlikely that restic takes 10 times longer than borg for incremental backups. restic 0.9.0 (released on 2018-05-21) contains completely reworked archiver code, so it could be that the old code (which was highly concurrent) overwhelmed the storage. But I’m only guessing here. I think restic would do much better today, whatever went wrong at the time of writing.
prune: The observations in the article are realistic. We haven’t improved prune a lot, but there’s an open PR about it and there’s somebody who contributes significant improvements. The challenge is now integrating these changes while making sure restic’s prune function stays correct (so it does not delete the wrong data).

I hope this helps

dhilgarth · May 6, 2019, 12:10pm

Great, thanks a lot. So, to summarize:

Memory consumption will go down as it is actively being worked on
Incremental backups should work just fine as they are today
Prune will be improved as it is also actively being worked on

Sounds great!

manfredlotz · May 7, 2019, 6:28pm

Compression would be important

mic_p · May 9, 2019, 2:14pm

Hi manfredlotz,
Here are our situation: hundred of s3 buckets where hundred of restic save backup data. Some restic clients have one snapshot every 15 minutes and our customers keep them for months.
This buckets have up to 10 or 15TB of data.
Some minio servers provides s3 buckets and they backends are always zfs (via nfs) with compression enabled.
The average zfs compression on zpools that we see is 2%

From my point of view, restic doesn’t need compression

By
Michele

fd0 · May 9, 2019, 5:12pm

Ah, that’s not quite right: what you’re seeing is that data in the restic repository (stored on zfs) does not compress very well. That’s not surprising: all data in the repo is encrypted and therefore has high entropy. Compression within restic would mean compressing files before encryption/deduplication.

mic_p · May 9, 2019, 7:45pm

Yes! You are right! My (big) fault. I forgot the encryption!
So… could you implement the compression?

Thanks,
Michele

hickinbottoms · May 10, 2019, 7:04am

Implementation is in the backlog (mentioned in the top-post: https://github.com/restic/restic/issues/21). There’s been quite a lot of discussion there.

fd0 · May 10, 2019, 11:17am

Oh, and please please please don’t comment in that issue, it’s way too long already

manfredlotz · May 10, 2019, 6:48pm

@mic_p

I had backup-ed my laptop data using borg and got

------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
All archives:                4.37 TB              3.70 TB              1.06 TB

                       Unique chunks         Total chunks
Chunk index:                 4301812             20884547

So my interpretation was that compression saved me 670GB which is quite something.

cdhowie · May 10, 2019, 11:07pm

Based on this output, I think that it saved you about a quarter of that; you need to consider the deduplicated size as well. Note that 2.7TB of compressed data was deduplicated. The benefits of compression go down as the benefits of deduplication go up.

For example, let’s say you have 500 copies of the same 1GB file, and that file compresses to 250MB.
The original data set, when compressed, goes from 500GB to 125GB. Once it is deduplicated, it drops further to 250MB.

It’s tempting to say that compression saved you 375GB, but it did not – deduplication alone would have reduced the 500GB data set to 1GB.

Adding compression to deduplication in this scenario only saves 750MB, nowhere near 375GB.

tomwaldnz · May 11, 2019, 4:52am

I recently moved from Borg to Restic. One major drawback with Borg is if you had a moderate sized repository (3GB of files in /var/www) Borg doesn’t do incremental backups well - it copies a LOT of data to a new file in the backup location. If you’re storing your backup locally that’s ok, but if you’re backup up to cloud that’s going to use a lot of bandwidth. Restic only backs up what is required, so is much more bandwidth efficient.

I run Restic on an AWS t2.nano that right now has 40MB physical RAM available (but with a lot of RAM used as cache) and 400MB virtual memory free (t2.nano has 512MB RAM). It backups a few GB nightly, 83000 files. CloudWatch says my RAM goes from 37% to 53% during backups, for about 1 minute. Resource usage for this amount of data seems really reasonable.

764287 · May 11, 2019, 5:44pm

manfredlotz:

I had backup-ed my laptop data using borg and got

------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
All archives:                4.37 TB              3.70 TB              1.06 TB

                       Unique chunks         Total chunks
Chunk index:                 4301812             20884547

So my interpretation was that compression saved me 670GB which is quite something.

Unfortunately those stats aren’t worth anything without knowledge about the dataset, number of snapshots etc. Take a look at the stats of 1 of my respoitories.

# restic stats --mode restore-size
repository 21006ba7 opened successfully, password is correct
scanning...
Stats for all snapshots in restore-size mode:
  Total File Count:   43591952
        Total Size:   4.492 TiB

# restic stats --mode raw-data
repository 21006ba7 opened successfully, password is correct
scanning...
Stats for all snapshots in raw-data mode:
  Total Blob Count:   692615
        Total Size:   76.261 GiB

Those stats are really impressive but the repository contains lots of snapshots with mostly static data.

What would be really great is a comparison of restic and Borg with real life data to see if compression is worth the effort. But as Borg is restricted to SSH backends I haven’t bothered to use it in a while.

manfredlotz · May 11, 2019, 6:43pm

First of all, the data on my laptop is real life data.

I admit that my interpretation of the savings may be wrong.

So, I just started a backup of all my btrfs subvols on my laptop using restic. Then I will do the same using borg. Let’s see how much the savings are.

I will report.

manfredlotz · May 11, 2019, 7:10pm

My laptop has 2 1TB SSDs with 20 btrfs subvols and two other partitions to backup, namely /boot and /boot/efi which are small.

The first SSD has 708GB in use, the second one has 720GB in use.

I also measure how long it take (I assume that here restic is far better than borg). RAM is 16 GB.

manfredlotz · May 12, 2019, 11:10am

Ok, I did a backup with both restic and borg.

Versions used

restic: 0.9.5 compiled with go1.12.2 on linux/amd64
borg: 1.1.9

Backup size

I forgot two subvols but backup-ed 20 btrfs subvols and /boot and /boot/efi. Size of data to backup was: 1259GiB

I should add that both repositories are on an external USB HD.

Elapsed time

restic: 5h:21m:21sec
borg: 7h:27m:28secs

Size of repositories

restic: 1054GiB
borg: 950GiB

So we have a savings of approx. 104GiB.

I will do a second round of backup to see how the elapsed times will be.

Second run

restic: 10m:01.86s
borg: stopped after 1.5h. It seems there is a bug and it is re-adding all files

Dj0k3 · May 13, 2019, 3:43pm

That’s awesome, restic is by far faster indeed. I noticed that not that long ago because I use both programs and Borg is taking longer to save changes (note that I use restic with sftp backend and borg is saving data directly to an external HDD and still restic is much faster). That besides the fact that you don’t need a server-side setup for restic made me change from Borg to restic completely.

The only downside with restic is compression but I assume that compression will slow down restic too because it has to open compressed archives and then do the whole operation.

manfredlotz · May 13, 2019, 4:39pm

Actually, I’m currently evaluating which backup software might be best for me to use. I did look only at borg and restic. Others, I didn’t investigate further. For example, duplicati is a mono based application which is a nogo for me.

No compression is a drawback for restic. But my test showed that without compression it is not as bad as I had assumed previously.
Being such fast comes with goroutines. This is really nice, and if restic is too wild using the resources there are possibilities (nice, ionice,…).
I don’t think that compression would slow down restic so much because there are lightweight compression methods which are pretty fast.

Although not ideal (but what is ideal in a world of duality?) for me restic is the winner of the game.

manfredlotz · May 13, 2019, 4:41pm

Regarding compression: a software could decide to not compress certain file types because it gives not much savings if at all (zip, gz, xz etc).

sniner · May 14, 2019, 5:02pm

I don’t think deciding by file type would be the right way. Restic saves data in chunks and a distinct chunk can belong to different files - and even files of different type.

Compression should be at chunk level and (without knowing the code, my knowledge of Go is very limited) it should be easy to implement: when writing a chunk, Restic tries to compress it, if it doesn’t shrink, it will write it uncompressed. A single flag for the compression type (none, lz4, …) in each chunk header would be sufficient. Hash values would be always for the uncompressed chunk, therefore nothing else in the code path has to be changed, only reading/writing chunks are affected.