LinuxBackup Script for Folder and DB

AurelianRQ · February 12, 2020, 6:36pm

Greetings,

I have the following setup and I wanted to know what would be the best option and if possible some sample scripts would be great .

Backup

1 /Folder
2. MySQL Database

Backup frequency - hourly

Backup retention - 1 year

I have around 5 identical Debian9 machines where I have to do this backup and they should be automated and eventually send an email every time there is a backup done or if there is any error on the process.

Now, initial Backup Folder, varies between 3 GB and 140 GB, can I set the script to check if a current backup is still in progress to skip it until the initial backup is done and then go for the hourly backups ? as doing a test on a machine with 130 GB it told me ETA around 22 hours so imagine the script running 22 times it will be a mess .

One more thing I guess , I read on the forum about possible problems with local .cache file, any idea on a 140GB backup how big that could grow ? if it will grow to big then I could put it on the backup drive as well, but I’m just worried that it will take all the space.

And the last thing I guess , can I backup the DB and the Folder in the same repository or I will need to create a separate repo for DB and folder ?

Thanks in advance.

gurkan · February 13, 2020, 7:45pm

Hi

You’ll need to write a wrapper and run it via cron hourly. You can check if repository has any lock with something like restic --no-lock list locks, but if you’ll use the same repository for all nodes, most reliable solution would be checking ps output to see if there is any restic binary is running.

Your first backup will take some time, but consequent ones will be faster, due to deduplication.

If I remember correctly, cache folder can take up to %10 of the repository (or I might be dreaming, please someone correct me). So as long as you give the same cache directory and add --cleanup-cache for cleaning unused cache, things should not go crazy.

You can back up multiple folders into same repository. If you append both folder names at the end of the backup command, you can even take them into same snapshot.

cdhowie · February 13, 2020, 8:12pm

The cache stores indexes and tree objects, I think. Index size scales linearly to the number of objects; generally, a large number of small files will require more memory than a small number of large files (for a certain definition of “large” and “small”).

So it’s not as simple as 10% of the raw size of the repository, though that might be a decent approximation in most cases.

MichaelEischer · February 14, 2020, 4:29pm

I’ve a repository with a cache of 7GB index data and roughly 65GB trees. The repository contains nearly 40 million files (a lot of small ones) and a total of 8 TBs of data. So for this repository the index is just 1% of the raw repository size. Therefore 10% of the raw repository size should be a very safe guess.