I have two questions regarding differing paths, but first let me explain the context.
I have a data directory, let’s call it /path/to/data. Within I have ZFS datasets as /path/to/data/dataset1, /path/to/data/dataset2, and so on. For those not familiar with ZFS, dataset basically means directory and snapshot means a filesystem snapshot.
I would like to backup their contents to an S3 bucket from their ZFS snapshots as the actual datasets are used by live services which I don’t want to stop them. The ZFS snapshots are, on the other hand, mounted read-only in some other directory. However, that path is different than the path of the actual dataset and because I include the timestamp in the snapshot name, the path will always be different for every single backup.
Question 1: If I backup different snapshots that originate from the same dataset, would that be as if I was backing the original directory multiple times and deduplication will handle it smoothly? That is, I won’t have a separate complete backup for each snapshot path? Are there any other problems that may arise due to using different paths for logically the same directory?
Question 2: What will happen if I try to backup snapshots from other datasets? Now, not only the path will differ, but the contents will also be very different. Will I be able to restore a backup to each dataset independently of each other? Or should I just use different buckets?
Snapshots are independent from each other. If the datasets contain largely different data, then it’s probably best to just use separate repositories, as you won’t benefit from deduplication anyways. Separate repositories have the benefit of containing a smaller index and thus reduce the memory requirements of restic. It also isolates the repositories from one another. If some repository were damaged, then it won’t affect the other repositories.