We stumbled across restic some days ago and it has been a positive adventure since. We deployed restic in some services with succes now and are now looking into backing up our virtualization cluster with restic.
We run proxmox as software to manage our cluster and proxmox ships with an awesome utility called
vzdump. This utility is able to create a backup (a vma file, spec) and output it to stdout. This would allow is to do something like:
vzdump [ID] --mode snapshot --stdout --compress 0 | restic backup --stdin --stdin-filename vm_[ID].vma --tag vm_[ID]
Which is awesome and would allow us to backup VM’s without doing any writes on the hypervisors themselves, which is awesome.
Unfortunately, now comes the downside. While doing some research and testing on our hypervisors, I came across the fact that the VMA format saves VM disks out of order (see here why). This means that data differs per backup, which isn’t efficient with restic’s dedup feature. Proxmox also mentions that vma’s aren’t rdiff friendly, and would generate an insane large diff. A 50GB VM with nearly no writes/changes, dedups to around ~40-35GB (so every backup adds another 40GB to the restic repository). I’d say that this is quite a waste of backup space (and traffic?).
There is a tool available (
vma) that is able to export a vma, which would create an in-order disk file. With this extract, we should be able to create a new vma file in-order. Which we should be able to snapshot into restic. But this process requires some temp files and wasted CPU cycles (and our NAS’es aren’t that powerful).
Now, I am curious to your opinion on this. How would you guys tackle this problem? And, would it for example be possible that restic may recognize a vma file, and write the backup in-order (so we don’t need to do the
vma extract and
vma create) or are there other solutions to this problem that I may not be seeing?