Backup system that handles deltas for large image files?

Hi,

I’m managing a dozen servers for web and mail and a handful of web applications running a mix of CentOS 7, Oracle Linux 7 and Rocky Linux 8. Currently these installations are bare-metal (no virtualization) and follow the KISS principle. For backups, I’m running two separate servers with Rsnapshot.

I’m currently thinking about replacing these installation by a series of virtual machines. Now here’s the problem I’m facing. Let’s say one of these servers hosts a LAMP server installation with OwnCloud and something like 500 GB of data. This would result in one big VM image file under /var/lib/libvirt/images.

Now how would I handle incremental backups with this? Usually I do backups every night with Rsnapshot, but since Rsnapshot uses Rsync under the hood, only the delta of all these multiple files is transferred. But when I use virtualization, my whole VM is in one big file, and as far as I understand, I would have to retransfer the whole 500 GB or so everytime there is a change to the VM.

Now I wonder: is Restic able to handle binary deltas, a bit like what the .drpm format is to .rpm? So even if I have one single 500 GB file, only the “altered bits” (if I may say so) get transferred in my daily backup?

Any suggestions ?

Cheers from the sunny South of France,

Niki

Yes, that’s the case.
In general, no matter if it’s a vm disk or something else: if you have a 500gb file backed up and one byte in the middle off the file gets changed, restic will just transfer this chunk the next time.

Of course you have to check that your vm disk is in a consistant state, maybe think about snapshots etc. But that’s not really a restic topic.

2 Likes

But keep in mind that in your case - even if there are only small changes in the image file to backup, restic always needs to read, chunk and hash the whole file.

This effectively disables the parent snapshot usage which is able to speed up standard file system backups a lot. So be prepared for a somewhat longer backup time (depending on the read speed of your image and the cpu) even if the data added to the repo may still be small.

3 Likes

I have multiple VMs and let them backup all by themself, so restic runs inside the VM.
I would really considering using this, because restore files would be a pain if you need to recover the complete image file. You could also mount the image (read-only) and run restic on this.

Agreed. I once wrote a posting about my test recoveries to vm, everything worked as expected.