Split backup between multiple drives

konrad · January 1, 2024, 1:54pm

Hey,
I am running a synology NAS with 3 8TB drives and have an older NAS with the similar setup at my parents which I use as backup target for Hyperbackup.

As I would like to have a separate backup of my files with a second independent tool which can also be scripted (e.g. start a backup before running an upgrade of software in my homelab), I started using restic some years ago and are quite happy, thanks for your effort in maintaining the tool and ensuring its stability.

I use restic to store files to a local drive I periodically connect via USB and also to upload files to onedrive as Office365 offers 1TB per familiy member as part of the offering I already pay for.
That works fine for a while, but I start to outgrow the space (currently I store around 10TB on my NAS). For onedrive I could use multiple accounts to store more data (would not be sufficient to store everything, but at least the most important data). I also own multiple HDDs which I could connect to my NAS concurrenty or after each other.
I was wondering if someone came up with a solution to automatical partition data?
E.g. having a single backup command which creates a single index and copy the files to different targets due to certain criteria like available space.

Right now I could manually select folders and back those up in chunks to a destination until it reaches its limit and switch to the new target. But if old folders are growing, it might require a second split and that also complicates the restore as I need to find the right repository manually and it also adds the risk of forgetting to include some data. I also need to leave unused space in order to run future backups for slighly changed data. Overall the mental overhead is not great and it is also not that easy to explain to others.

Last time my single 8TB USB attached disk ran out of space, I revived my first NAS and inserted the 8TB and another old 4 TB as JBOD pool. This worked for a while but I can’t add more disks now and my storage grows constantly due to RAW photos and videos (my wife is a enthusiast hobby photographer).

Now I am thinking about buying a new disk with at least 16TB, but the problem will hit me sooner or later again.

Also, I try to reuse my old gear whenever possible, as I don’t want to waste too much money or resources for my secondary backup.

I guess the recommended approach would be to use scalable object storage, but that tends to get expensive for 10+ TB as well… not to forget potential egress costs…

Just wanted to ask if anybody has a good solution in mind, automatic splitting/resharding of repositories to multiple destinations would be quite a cool feature, but probably also hard to implement without breaking compatibility…

I am wondering if it would be possible to implement as new type of backend or maybe as part of the restic rest server itself.
I guess splitting encrypted data would be ok, if I loose one location I would expect to loose the entire repo.

Thanks for any ideas

alexweiss · January 1, 2024, 7:41pm

Why not use a simple RAID or LVM or whatever is available to map multiple physical devices to a big logical one?

konrad · January 1, 2024, 9:59pm

Thanks, yes, I basically did that with my old NAS, but transfers over 1Gbit network were way slower compared to direct USB connections and I could only fit in 2 drives. Creating raid/lvm on attached USB drives is not supported on Synology devices AFAIK, but yeah, that’s not really a restic problem. Maybe a DAS enclosure could be a workaround for that problem.

When it comes to cloud storage, I am not aware of an option to group multiple accounts into a single mount point.
Maybe mounting them locally and applying mergerfs or something similar could work, but that sounds rather flaky as I don’t know how mergerfs and restic would handle such mounts… probably I should just do rough partitoning like “photos until 2020” map to drive 1 and eventually buy a big enough drive.

shd2h · January 2, 2024, 11:46am

probably I should just do rough partitoning like “photos until 2020” map to drive 1 and eventually buy a big enough drive.

I think this is probably the simplest solution that meets your criteria of splitting the data, and it it works for the cloud portion of your storage too. You could create a repository on each drive (or cloud), and then store specific snapshot(s) at each repository. If each snapshot only contains a subset of your data, then that should limit the size of the snapshots. If one repository gets too full, restic copy can be used to migrate the data.

Obviously you lose the benefits of deduplication across all your data by doing this (how well do large RAW photos deduplicate anyway though?). However you do keep some robustness, such that if an individual drive (or cloud provider) has an issue, only the snapshots on that particular storage would be impacted.

Obviously this does involve some admin overhead, but much less than manually copying bits of a restic repository around would, and it is more easily scalable, up to a point anyway.

Longer term, given your local storage needs seem to keep expanding, I’d probably recommend migrating to a solution that gives you the flexibility to do that. A dedicated (cheap?) PC used as a NAS, perhaps running openmediavault or trueNAS, will give you far more flexibility than an off-the-shelf NAS appliance does.

konrad · January 2, 2024, 1:45pm

Thanks, I totally overlooked the option to copy snapshots to different repositories
Up until now I usually just used a disk until it was full and either bought a new drive or repurposed it in a JBOD, which effectively causes the loss of the repository history, so not a great solution either.

I used to have an old PC as a backup server running TrueNAS and it worked okay for a while, but suddenly some component died and the bios just started beeping and it was unable to boot any more… probably the graphics unit died, but due to the hardware age and power consumption, I just dropped it.
Maybe I should buy a newer used model, right now I run a Thinkpad T460 as proxmox host, not a beast resource-wise, but provides usually enough horse power for my homelab projects while working power efficient. The downside is the lack of SATA connectors to add disks…

Thanks for all the hints

konrad · January 10, 2024, 8:40pm

I just stumbled over rclone’s union Union backend. That backend seem to exactly implement what I was looking for and it should already work with restic.

Basically, I can create individual remotes for each destination, e.g. multiple remotes for each OneDrive account and create a union remote of all other remotes.

If combined with the local filesystem backend Local Filesystem it should be possible to combine multiple physical hard disks into one backup target.

RYTD29 · February 1, 2024, 9:51pm

@konrad : did you manage to get restic with rclone union to work? I would like to run it on two local HDD-discs, but just get error when trying to do ‘init’.

konrad · February 2, 2024, 4:09pm

Yes, it worked by mounting both USB disks and configuring a “local” remote (Local Filesystem) in rclone.
Afterwards, I could configure a “union” remote (Union) which mentioned the “local” remote twice with different paths pointing to both mount points of the USB disks.
When configuring the “union” remote, I chose non-default placement strategies for “create” and “action” as I wanted to spread load evenly and path preservation is not exactly working that great, if you act on restic repositories instead of a plain filesystem.

Anyway, once I figured everything out it was working, but I still decided to just split my backups manually. In case of a drive failure, I only loose parts of my backups and the speed is also better that way. Partly, due to less overhead of rclone, but in my case also because I had a Seagate Barracuda (SMR drive and therefore not great on massive writes) combined with a Seagate IronWolf Pro (CMR drive which is way faster on the inital backups and provides consistent speed in comparison).

Additionally, partitioning my data manually is a conceptually easier architecture and in case of a disaster it would be way easier to restore as I don’t need to remember/explain any rclone config.

Finally, as I had two 8TB drives the need to split datasets was okay, I might have decided differently, if I would have needed more complex logic to split my data manually.

RYTD29 · February 2, 2024, 9:40pm

@konrad : Thanks a lot. Your hint with non-default placement strategies for “create” and “action” did the trick. Now (without path preservation) everything works as expected.