CERN is testing restic for their backups

#1

Maybe an other success story: https://cds.cern.ch/record/2659420

(if this is not the good place for that please delete this post).

6 Likes

Anyone use restic with large data sets?
#2

Awesome, thanks for the hint!

1 Like

#3

@fd0 maybe you can contact them to see what happens at LAAAARGE scale :wink:
However this is still a WIP but promising.

1 Like

#4

Heh, they have 16k users with (combined) 3PB of data, but they use one repository (in one S3 bucket) per user, so the memory usage will not be such a huge issue :slight_smile: Good trade-off, IMHO.

And it’s just at the evaluation stage for now. I’m curious for the result of their evaluation…

1 Like

#5

I bet you are. :slight_smile: I am too and I hope they do publish results or recommendation

0 Likes

#6

keep calm this is their goal :slight_smile: for now they just have 200 users with a total of 5M files.
This is already more than one can have for personal backup :wink:

1 Like

#7

I would say that being able to survive a WIP at an organisation like CERN with this size of files is already a reason to open a (or two) glass of your favourite drink.

4 Likes

#8

Hi, I am the person running this project, I’ve been around since a while bothering you in the forum/github :slight_smile:

As a quick update just to say that this project is progressing fast and I’m very confident about it going into production at some point. Currently we are backing-up 370 accounts daily, and we plan to increase it to 1k shortly.

And also if we do a mesh in one repository only one user would be affected :slight_smile: Other reason of this is that we can have more flexibility with bucket placement policies, like moving important users to critical areas, adding extra S3-side replication to certain users, etc… The main problem of this is that we don’t get the full power of the de-duplication but as you said, is a fair trade-off.

Yes, for sure! now the orchestration tools are very coupled to our environment but my idea if this goes into production is to make it more generic and share it.

I will maintain you updated about any news regarding this project and feel free to contact me if you have any question :slight_smile:

3 Likes

#9

@robvalca when you say S3 I assume Ceph, right?

1 Like

#10

@fbarbeira Yes, we are using ceph+radosgw.

1 Like

#11

I’m looking forward to that! :slight_smile:

1 Like

#12

I’m very glad to hear that! It’s the same approach that we are implementing in our infrastructure. Not so ambitious like yours, but also huge (4k users and 1PB data).

I will stay tuned to your advances! :smiley:

0 Likes