Recommendations regarding a medium size infrastructure

jaepetto · April 24, 2020, 2:45pm

Hi,

I’m currently running restic for a small infrastructure (5 servers).

Current approach

I am using 1 s3 bucket (common to all servers)

each server has it’s own cron job that:

backs-up it own file system (every 2 hours)

restic backup /home -v --exclude /dev --exclude /media --exclude /mnt --exclude /proc --exclude /run --exclude /sys --exclude /tmp --exclude /var/tmp --tag <hostname>

forgets the old backups (once a day)

restic forget --prune --keep-last 2 --keep-hourly 4 --keep-daily 10 --keep-weekly 9 --keep-monthly 3 --keep-yearly 10 --keep-tag special

Unfortunately, this strategy does not scale well (it is quite inconvenient to manage cron jobs across servers and even more inconvenient to check the logs).

Furthermore, I have a very strong feeling that the forget --prune actions on multiple servers actually collide.

Finally (side note), I just noticed that the restic cache did fill up one of the nodes…

Considered approach

In order to manage that in an easier and more transparent fashion, I am starting to use jenkins to coordinate the actions.

My current approach is to have one jenkins job per server (in order to perform the backup) + 1 job in charge of the pruning.

Does anyone have experienced something like this? Is there any pitfall in that approach?

I was also wondering about the s3 buckets. Is it better to have 1 bucket per server / app? or 1 common bucket for everything?

Thanks in advance for sharing your experience,

Emmanuel

MichaelEischer · April 24, 2020, 10:35pm

Hi, the question whether to use one bucket per server or a common bucket, depends a bit on the data contained in the backup: If the servers share large amounts of data then it could be worthwhile to use a single repository to benefit from the deduplication, but in most other cases you probably want one repository per server (this especially applies when the backup reaches the terabytes range. Currently restic is not that fast for multi-TB repositories and therefore splitting helps a lot). Using separate repositories also has the benefit, that different hosts can’t read each others data and also have separate caches. Instead of using multiple bucket you could also add a prefix to the repository path such that the repository url looks like s3:s3.amazonaws.com/bucket_name/server_prefix.

forget --prune requires an exclusive lock of the repository which means that no other prune or backup runs can happen in parallel. By default forget cleans up snapshots for all hosts so the single prune job should be fine.

Personally, I’d see Jenkins rather as a tool to run continuous integration tasks for some software and not as a tool to manage low-level system tasks like backups. For that I’d rather recommend cron jobs or similar which are distributed by some configuration management tool. However, I cannot say much about whether you would run into reliability or other problems when using Jenkins.

jaepetto · April 27, 2020, 7:08am

Hi @MichaelEischer,

thanks for your answer.

I’ll probably go for 1 bucket per application. IMHO, it has the below advantages:

cleaner in the sense that data from different services are not mixed up and there’s a very low risk of restoring the wrong data
Probably easier to maintain the prune & forget jobs
As you said, the forget jobs run for all hosts. Therefore, if you say you want to keep 4 last snapshots and you have 4 hosts, you actually keep only 1 snapshot per host.

Regarding the use of Jenkins vs local cron jobs, this is an entirely different world. Even if cron jobs are easier to put in place, they are a nightmare to manage once you have more than a couple of servers. There’s no easy way to have an overview of all the jobs and their respective status. This being said, I am not saying that Jenkins is particularly well fitted for this. There are probably other tools that are better fitted (e.g. rundeck). This choice was just a matter of personal convenience (since it is also used for other types of tasks).

764287 · April 27, 2020, 8:51am

restic uses --group-by host,paths by default, hence restic would keep 4 snapshots per host and backup unless told otherwise.

jaepetto · April 27, 2020, 10:30am

That’s really good to know. Thanks.

Recommendations regarding a *medium* size infrastructure

Current approach

Considered approach

Recommendations regarding a medium size infrastructure