Very slow when migrating btrfs snapshots to restic

#1

Hello

For the last years I used a simple but quite efficent way of backup: I did rsync --inplace for my NAS data and after rsync did its job I took a btrfs snapshot of the current state. That way I got deduplication (as long as the path of files did not change) and even compression (which restic doen’t have yet). But this backup method has one flaw: when I shuffle around files on my NAS they would end duplicated in my backup and that prevented me to clean up my NAS for a long time :slight_smile:

So, I found restic and it looks like the perfect solution for me. First made an initial backup with restic as an test and then a second to see how long it takes. rsync took about 30 minutes, restic more than one hour. I didn’t investigate that but as it is at least bearable, so I decided to go with restic.

Therefore I want to migrate my existing btrfs snapshots to restic. That’s what I do (per script): I bindmount the snapshot to the original mountpoint of my NAS and let restic do its backup magic.

But on every run, restic will re-read all files and that takes ages:

snapshot 624f84bb saved
open repository
repository 82b9ca20 opened successfully, password is correct
 
Files:       135140 new, 145345 changed,     0 unmodified
Dirs:            0 new,     1 changed,     0 unmodified
Added to the repo: 5.589 MiB

processed 280485 files, 1.377 TiB in 5:57:25

I checked the metadata like timestamps and inodes, but can’t see a difference in files on different btrfs snapshots. To shorten the time I already removed a number of old snapshots and kept only one per month, but it will still take weeks to migrate the old backup to restic.

What can I do to prevent restic from re-reading unmodified files?

TIA

1 Like

#2

restic seems to be unable to find the parent snapshot, hence it is rereading all the data just to make sure it doesn’t miss anything. In this case, using --parent can speed up the whole process by a lot.

1 Like

#3

Well, I think I found my mistake. Or better misconception of how restic works. I thought restic stores blobs in a content-addressed way (like git) and thats all about it. But I forgot there is a parent-child relationship between snapshots (well, also like git :blush:). With cat snapshot I found that my snapshots from bind-mounted btrfs snapshots have the latest (means: latest time) snapshot as parent. And that was the backup I took from the current state of my NAS. So, all older (in time) snapshots are now children of the latest (in time) snapshot in restic, and not child of the snapshot I took before that.

After I removed the two most current snapshots in restic, the migration of an older snapshot took only a few minutes.

Problem solved.

1 Like

#4

Yeah, you are absolutely right. Thanks!

0 Likes

#5

It does. The place where you’re losing time is that restic has a fast path, if there is a parent snapshot (which you’ve now found): it compares the metadata including mtime and size. If this hasn’t changed, restic assumes that the file has not changed and doesn’t re-read it at all.

If it doesn’t have a set of metadata to use, restic has to read each file, chunk it, and hash each chunk. At the end of this process, it does use the content-address mechanism to determine that the chunk is already in the repository, and so it doesn’t store it again – but it still has to do all of the chunk+hash work in order to make that determination. This still deduplicates, but it’s the slow path.

0 Likes