B2 - multiple hosts to single repository bucket?


#1

Hi! New to restic and using it with B2 backend. I’m interested in the data-deduplication as I’ll be backing up client computers that sync data amongst themselves (not a traditional server - client setup.)

For example: backing up “google drive” accounts where there are a large amount of files in common via shared folders, but not all files are in common (users still have some folders just to themselves.) And, in this example, using google drive to store PDF and Open Office files (not just the links to ‘google calc’ instances.)

B2 costs money per GB, so keeping data smaller if possible is advantageous.

I can’t tell from my research if it is OK and/or advantageous from a “data storage” perspective to have multiple clients backing up to the same B2 bucket repository? Will “test.pdf” that exists on multiple computers be stored once because someone else already committed it to the backup? Or, will this get too messy amongst clients, not work, and should be avoided?

Last point in my circumstance - I’m rolling out to 5 or 6 users - so not a major enterprise. We do have growing data storage needs, but still are very “small-office.”

We are mixed OS, but with macOS the most prevalent.

Many thanks for any pointers, advise, or further reading.

My best.


#2

Hi, and very much welcome‎ to the restic community!

Your suspicion is correct - restic will only store the contents of that identical file once, even if you back it up from multiple clients/sources. Even if only parts of the file would be identical, those parts would be stored just once, as restic inspects chunks of the files when deduplicating.

It’s not a problem to use the same repository from multiple machines. Each snapshot will have that client’s files/folders listed, so it won’t be messy either, e.g. when you want to restore.

Restic stores the hostname of the client with each backup/snapshot (and you can override this with --hostname), and you can also tag the snapshots with your own tags if needed.

You can also create a separate password for each client (see the key command), so that you can easily remove access to the repo for just that client, would it be decommissioned or ‎stolen.

Have a go with it, fire up a test repo somewhere and test backing up a couple of clients (for a quick test, you can back up just parts of their filesystem, and you can use a simple sftp repo as well)!

As you might know you can mount the repo via fuse, so you can simply browse the snapshots when restoring, if you want.


#3

Can they back up concurrently too? (I know a lock is created but not sure how exclusive it is.)


#4

Yes. All backup operations can be run in parallel. At the same time some repository maintenance operations (prune, check) requires exclusive access (all backup operations will fail while repository is locked exclusively)


#5

Backup can be concurrently, but that may lead to duplicate data being written by different hosts. It’ll be cleaned up by the next run of restic prune though, so it’s not an issue.

That’ll work just fine, the contents of the file will be stored just once. I’ve written a lot of background information in the restic blog here: https://restic.github.io/blog/2015-09-12/restic-foundation1-cdc


#6

Thanks for the great feedback everyone.

I did a backup of a shared folder that was 6.87GB (most shared, some unique to that computer.)

I followed with another computer, same shared folder, and a total size of 7.48GB.

My repository size is 7.55GB after backing both computers up to the same repository. Very cool.

I was also able to mount the repository with fuse, and browse the backed up files. I guess if I was looking for an older version of the file, I would first list the snapshots, then mount the snapshot I think has the version of the file I need (assuming I hadn’t “forgotten” and “pruned” it away already.) I will read and understand more about file versioning so I can keep some backup “sets” as needed.

I have not tried concurrent backups with multiple computers writing to the same repository - but the time will come!

I enjoyed the article on CDC - a lot of work is going on behind those little scrolling numbers on my command line. Thank you for figuring it all out, and making it available to us.