Active-active rest-server with shared repo

FATruden · May 27, 2022, 10:38am

Hi!

I’m going to use restic with rest-server in very large env with hundreds of parallel backup jobs.

I’m going to use one DNS record like ‘backup-host’ which will be resolved to 2 or more rest-servers in round-robin manner.

All of rest-servers will use shared directory (distributed FS like CephFS) which will contain different repos like:

/shared-dir/repo1
/shared-dir/repo2
…

In result, we will have few backup jobs at same time which will go to one repo through different rest-servers.

I have checked it briefly and it works fine.

More over, it works great in case when one of rest-server is unavailable (MM for example) - restic seamlessly chooses server which is alive now.

What are you think about this approach? Can it have some problems?

MichaelEischer · May 27, 2022, 6:36pm

Hmm, the most interesting situation is probably locking the repository: is a file that was written on server 1 immediately afterwards also visible on server 2 and vice versa? If yes, then locking and any other operation should just work as expected.

Files are written atomically by recent rest-server versions (and different restic client won’t write the same files at a time anyways). And the directory listing is directly read from the filesystem.

If files become visible with some delay at another server, then the operations in general should work well, maybe with some retries. But I wouldn’t rely on restic’s locking mechanism to prevent concurrent e.g. backup and prune tasks.

FATruden · May 30, 2022, 11:58am

@MichaelEischer thank you for the reply!

I’m going to use rest-server only for backup tasks (append mode) and for restore.
For other tasks like listing snaps, snap remove or prune i will use restic client directly (–repo /shared-dir/repo1).
Should I try to do it (snap remove or prune) in time when no other backup or restore jobs through rest-server to exact repo?

is a file that was written on server 1 immediately afterwards also visible on server 2 and vice versa?

As i can see, yes, snap and blobs are visible from second server immediately after backup of this file.

But, there is one detail. I do backup of block device like:

cat /dev/sda | restic -r rest:http://rest-server1:8000/repo1/ backup --stdin --stdin-filename sda_1 --password-file ./pass-file

There is only one “file” in my backup job. And I can see blobs of this snap only after end of backup job.
If two backup jobs of similar block devices will be started one after another (few minutes after start, for example) - second task can’t reuse existed blobs of first task because they can’t be visible until end of first task.
I’m right? In this case we can have many of similar blobs in one repo?

alexweiss · May 30, 2022, 12:21pm

@FATruden: Is there a reason for your rather complex setting using instances of rest-server? I mean having these redundant REST-servers hopefully gives you a self-made setup for a REST-accessible high-available distributed storage.
But there exist already dozends of other solutions which already solve exactly the same problem, like MinIO.
And I remember I’ve heard that Ceph also already exposes a S3-compatible interface. So why not simply use this instead of adding a layer of complexity?

FATruden · May 30, 2022, 3:31pm

@alexweiss you are right about redundancy.

But I don’t know how to prevent backups from removing.
With rest-servers in append only mode I can use many endpoints for backup purpose and I can have only one server with direct access to repos for remove purpose. I can organize it on infrastructure/physical level (firewall, ssh access, etc.).

With s3 compatible storage I can have special keys (access + secret keys in s3 protocol) without remove capabilities for backup purpose, it’s fine.
But i need another keys with full capabilities for remove purpose. And this additional pair of keys is very big problem for me because anyone who has this keys - can easy remove all backups through any s3 endpoint.

Hm…may be I can apply some s3 policy to restrict access with admin keys from only exact computer…