Backup VMA (proxmox) with restic

Wouter0100 · June 24, 2019, 9:24am

We stumbled across restic some days ago and it has been a positive adventure since. We deployed restic in some services with succes now and are now looking into backing up our virtualization cluster with restic.

We run proxmox as software to manage our cluster and proxmox ships with an awesome utility called vzdump. This utility is able to create a backup (a vma file, spec) and output it to stdout. This would allow is to do something like:

vzdump [ID] --mode snapshot --stdout --compress 0 | restic backup --stdin --stdin-filename vm_[ID].vma --tag vm_[ID]

Which is awesome and would allow us to backup VM’s without doing any writes on the hypervisors themselves, which is awesome.

Unfortunately, now comes the downside. While doing some research and testing on our hypervisors, I came across the fact that the VMA format saves VM disks out of order (see here why). This means that data differs per backup, which isn’t efficient with restic’s dedup feature. Proxmox also mentions that vma’s aren’t rdiff friendly, and would generate an insane large diff. A 50GB VM with nearly no writes/changes, dedups to around ~40-35GB (so every backup adds another 40GB to the restic repository). I’d say that this is quite a waste of backup space (and traffic?).

There is a tool available (vma) that is able to export a vma, which would create an in-order disk file. With this extract, we should be able to create a new vma file in-order. Which we should be able to snapshot into restic. But this process requires some temp files and wasted CPU cycles (and our NAS’es aren’t that powerful).

Now, I am curious to your opinion on this. How would you guys tackle this problem? And, would it for example be possible that restic may recognize a vma file, and write the backup in-order (so we don’t need to do the vma extract and vma create) or are there other solutions to this problem that I may not be seeing?

Thanks!

cdhowie · June 24, 2019, 3:34pm

When writing the disk file, does the process seek in the output at all? (You could use strace to check.) If it does not seek in the output, you have a few options.

You may be able to get it to write to stdout even if it doesn’t understand the special - filename by using /dev/stdout.

#!/bin/sh
vma create /dev/stdout ... | restic backup --stdin ...

You could also use bash process substitution:

#!/bin/bash
vma create >(restic backup --stdin ...) ...

If the tool does seek around the output file then I’m afraid there’s not much you can do but live with the temporary file.

Wouter0100 · October 16, 2019, 9:56am

Thanks for your response, @cdhowie!

It does not seek in the output and that is not really the issue. We have already build a script to backup VMA files with restic and restoring works just as good, which is awesome ( restic) .

The only issue is that it is not really incremental, as the VMA files contain out-of-order disk data (so it does not need to seek into the output when creating the backup) - as I currently understand. This results in restic thinking it contains different data, but it is actually the same data but in a different order. Would it be - for example - possible for restic to write it in the correct order to the backup server?

If so, I may look into that myself or hire someone that could take a closer look into this thought process, as this would be a large space saver.

cdhowie · October 17, 2019, 3:15pm

I don’t believe that there is any way to get restic to chunk files differently.

As long as restic is able to chunk files on the correct boundaries, deduplication is possible even if the contents of a file changed in the middle… however, it depends on whether restic does this optimally or not for the given file.