so I want to backup 21Tb of data (for now, more in the future) with restic to b2. Besides me being unable to acchive the full upload speed from Europe (100MBit in my case, just get 48ish), I am thinking a little towards the downsides of restic in terms of indexing duration. As far I understood Everything has to be red by restic in order to determine the differences that took place, but this would roughly take 24h for the given 21TB in my case. Since there are filesystems that support snapshots and returning only the differences I would like to ask If anyone has an Idea how to take advantage of this.
Unfortunatly this question not only realtes to restic but also to other “snapshot filesystems”, in my case btrfs, but somehow also to zfs, ceph and you name it.
There are several approaches in my head I wanted to start discussing:
- getting a “diff list” between snapshost A and B
- delete list of all deleted files
- modify list of all modified files
With this data it should be possible to create a new remote snapshot in restic with
- “refresh” all files from the last restic index (minus “delete list” and “modify list” files) for new snapshot
- add files (reindex) from “modify list” to new snapshot
This would be the most basic operation, as this would be still very inefficeient for large files (e.g. qcow2 on btrfs), as the entiere qcow2 would have to get reindexed. For my purpose it would propably be just fine.
One could expand that concept of the “delete list / modify list” to point out the blocks that were changed and feed that to restic, to avoid indexing, but I guess that would be an even bigger task.
Another possibility could be to tell restic to skip all files (or assume unchanged), which have the same modify timestamp and file size as in the last run. I suppose not checksumming the files every time would speed up the index significantly. While I understand the possibility of missing files that have been modified by manually setting the modify date and still having the same byte size, I guess this could be a risk I clould live with an my environment. Also I would concider doing weekly “full index runs”, as long as I get my houerly snapshost.
Has anybody already dome some reasearch in that direction? Can’t be just me, right?