What is your largest dataset?

I run CourtListener.com where we have millions of PDFs amounting to several TBs of data. I’ve found a couple things in 0.8.3. (0.9.0 is supposed to be better, but I haven’t tested it yet.)

  • By far the slowest thing is iterating over the files. I think this is better in 0.9.0, but it was taking a very long time, around a day, just to do this.

  • Memory usage was a major problem. I caught Restic using dozens of GB of memory if I recall. That’s not normal for most backup software I’ve used.

  • Haven’t tried a restore or prune yet.

  • The fuse mount was unusably slow. Too bad — what a cool feature!

The backup endpoint I’m using is backblaze (because it’s cheap).

I need to do more research, but I’m kind of on the fence right now until I try 0.9.0. The memory usage and speed were real problems, but perhaps they are better in the latest version, and we’re upgrading our RAID array, which should help too.

Not sure how much this helps, but I think big picture Restic is a more complex system than something like rsnapshot. With that complexity comes features (yay!) and CPU/memory consumption (not yay!).