Best practices for two folders backups to cloud

noki · May 21, 2024, 11:10pm

Hi, first of all, I’m 20y IT Manager guy, and I’m pretty amazed by restic! Congratulations, you have an amazing tool and community around here!

I’m learning tons thru experimentation.

I’m curious to know the best practice around doing 2 very distinct folders to backup to Backblaze. Same computer:

One has 400Gb and changes 1Gb per day
The other 120Mb and changes 120Mb per day

How should I proceed? Two buckets, one for each folder? same buckets, just different tags? or something else?

Thanks you all!

sc2maha · May 22, 2024, 11:19am

From a restic pov I don’t see it makes any difference.

I would differentiate based on:

how many snapshots per day do you need?

In my case one of the repos (the smaller one) is far more active so I do a snapshot every 20 minutes; the other one is about 10X the size and gets done once a day. NOTE that the prune parameters are also different for these two.

where do the backups go?

again, in my case the smaller (more important) one goes to multiple external destinations, though it is not all automated – some may be manual.

noki · May 22, 2024, 1:13pm

Hi sc2maha!

*I do it once a day, both of them. The prune can be the same.

*I’m sending it to multiple locations, but the big one is all set up already. I’m adding the small one now, and I will send it first to a Backblaze bucket.

It’s all automatic, a script stops the software, I use restic in the same computer to send to Backblaze. Restic in a local backup server, and Cobian (old habit, while I’m getting the hang on restic) to a NAS disaster-proof storage. Then another script starts the software again after all backups.

This is the restic script Task Scheduler once a day runs it:

restic -r s3:s3.us-west-002.backblazeb2.com/esoft backup C:\Esoft --tag esoft --read-concurrency 10 --option s3.connections=1 --compression max --verbose
restic -r s3:s3.us-west-002.backblazeb2.com/esoft forget --tag esoft --keep-daily 16 --keep-monthly 3 --keep-yearly 3 --prune --verbose
restic -r s3:s3.us-west-002.backblazeb2.com/esoft check

I was facing disconnections from the router when using multiple connections after more or less 3 hours running the first backup, too over 27 hours with many interventions and an index repair.

Check read-data was too slow and expensive to run daily so I will do it once a month.

The backup schema setup has many moving scripts, all synchronized, but I think these are the core relevant for this discussion.

Any suggestions or tips are more than welcome.

Thanks again!

noki · May 23, 2024, 2:58pm

For anyone in this thread, I settle for 2 different buckets, one for each dataset. It seems to me that might give me more freedom and organization down the road.

It’s a very subjective decision.

tomwaldnz · May 26, 2024, 9:09am

Prune monthly is sufficient. I use separate buckets too.

noki · May 27, 2024, 3:18pm

Thanks @tomwaldnz for getting back.

Why do you only do it once a week?

gvangool · May 27, 2024, 3:43pm

I’ve experienced slow backups to Backblaze.

I initial tried switching from the s3-backend to rclone (which improved reliability) but settled on a 2-step approach (with restic copy).

In a local restic repository, create an hourly snapshot
Copy the last 12 hours of snapshots to a restic repository on Backblaze

noki · May 27, 2024, 5:34pm

Great to hear!
I do find Backblaze to be slow, but the client is located in a rural area with a mediocre internet connection. The router disconnects the computer from the network if multiple connections are used.

Fortunately, this is not a problem for me as I only need to do backups after office hours. To ensure a valid backup, I have to stop the server database first, perform the backup, and then restart it.

If something goes wrong, we could potentially lose a day of data—not ideal, but much better than before.

In any case, @gvangool, your post will be very helpful if I decide to take more snapshots during the day. Thank you!

tomwaldnz · May 27, 2024, 9:20pm

I only prune once a month for most of my backups.

Prune’s only job is to recover disk space, and it uses resources such as CPU and bandwidth. It can increase cloud costs by adding requests, downloading packs, and uploading new repacked packs. If you have sufficient disk space it’s more efficient to repack less often, in cloud you’re not going to run after each backup.

Pruning after every backup is fine, it’s just going to cost you more and have few benefits.

Another optimisation is to stop your virus checker scanning the restic executable, the executable you use to upload data (I use aws cli), and the local repo if you have one. Some of my backups are to local spinning disk, the virus scanner was going crazy reading every file in my backup, once I excluded it the backup was much faster.

noki · May 29, 2024, 11:35pm

Thanks @tomwaldnz! It makes sense, I’ll make both of these changes.