Forget Prune on B2 caps download

targodan · February 18, 2019, 8:22am

Hey there, restic seems pretty awesome but I ran into a problem today.

I use B2 in conjunction with restic to backup some stuff of my server. (~5GB)
I want to only keep the last two snapshots, so I do “restic forget --prune --keep-last=2”. This however capped out my download of 1GB upon pruning. How much download can I expect of a prune and how might I be able to reduce this?

Is there a way I can find out if it uses local caches at all? It should be able to utilize the cache 100% as I only ever use this repository from the same environment.

I use restic version 0.9.4

cdhowie · February 18, 2019, 4:36pm

This was likely consumed when rebuilding packs to remove deleted objects. When an object is deleted, the pack it is contained in must be rebuilt. Basically, the other (non-deleted) objects in this pack (and possibly other packs) are downloaded and rebuilt into a new pack.

The amount of download traffic this will consume depends very heavily on how the objects are arranged into packs, which can change a bit as time goes on.

The simplest answer is to prune less frequently.

I believe indexes are cached, but that doesn’t help you much here.

Consider instead backing up to a local repository and using rclone sync to update the repository on B2. rclone will use very little traffic to determine which files need to be synced.

targodan · February 18, 2019, 4:56pm

Thank you for your fast, in depth reply.

These workarounds sound a bit annoying to be honest. However I could live with only keeping the newest version at any time.

Is there an easy way to maybe tell restic to not create a new snapshot upon backup, but instead to just overwrite everything?

cdhowie · February 18, 2019, 5:01pm

That’s not how restic was designed to operate. If you want to keep just one backup, restic is overkill.

It might be better just to stream a tar archive to B2 and then delete the old tar.

targodan · February 18, 2019, 5:19pm

Well, I can see the argument. But on services where you pay more for download than for storage (like b2) the choice between accumulating more and more data over time (because you don’t prune) or regularly downloading a whole bunch leaves a bit of a sour taste.

Then again, right now I went the lazy way. I already had a backup script for doing local backups in gzips. This way I probably ruin all deduplication of restic. So maybe if I get rid of that and rather backup the files directly using restic the accumulation might be slowed.

You know what. I think that’s the cleanest, if not best solution. Make a local restic backup without gzipping first, then sync via rclone. Guess I should put in that much work.

Thank you again!

cdhowie · February 18, 2019, 6:00pm

It’s what I do! This way, if I need to restore some data because of human error (“oops, didn’t mean to delete that”) I don’t have to go straight to B2 and pay their egress fees. B2 is exclusively for disaster recovery (house fire, tornado, server motherboard/PSU goes haywire and fries all of the disks).

Dj0k3 · February 18, 2019, 8:03pm

You might want to check another service like rsync.net. There is an offer with them for restic, no caps. I pay around $10 annually for 40GiB and it works really great. Check this thread for info.

cdhowie · February 18, 2019, 8:05pm

40GiB on B2 is $2.40 per year.

Dj0k3 · February 18, 2019, 8:10pm

But you actually would pay $2.40 or you need to pay for more space so price is $2.40 for those 40 GiB? And what about egress? Don’t you end up paying more anyways?

cdhowie · February 18, 2019, 8:32pm

The price is $0.005 per GB-month with no minimums or commitments. For 30GB (average) over a year, this works out to $2.40 for that year ($0.20/mo). If you only use 30GB (average) over the year you would pay $1.80 for the year, etc.

The egress fees are $0.01 per GB. However, since I perform my backups to a local disk and then mirror that to B2, I only have to pay for egress in the event that I lose both my local system and the backups.

And even still, if you download the entire 40GB repository from B2 that’s only $0.40 in egress fees.

rsync.net really only makes sense if you are planning to restore entire repositories approximately 20 or more times per year.

Dj0k3 · February 18, 2019, 8:49pm

That’s nice to know. Thanks!

targodan · February 18, 2019, 8:51pm

I can only confirm what cdhowie said. By now I am setup in the way he described earlier.

I backup to a local restic repository (which by the way is quite a bit faster than tar-gzipping stuff :D), prune locally if I want to and then use rclone to upload that to b2. rclone does not do any downloads, so you don’t run into the problem I had earlier. And in my use case I also will only download from b2 in case the server center hosting my vps burns down (fingers crossed that never happens). So in my particular case (and in cdhowie’s I guess) b2 is cheaper. In fact the cheapest hoster I was able to find so far.

cdhowie · February 18, 2019, 9:14pm

@targodan Be careful though, there’s one disaster you may not be protected from: malicious compromise of your server. If an attacker is able to get into your server, he can probably also destroy your backups on B2.

targodan · February 18, 2019, 9:34pm

Yeah, that’s a threat model I haven’t tackled yet. I guess the only way to avoid that is either manual backups or scp-ing from another server. But as it’s just a private server for fun it’s not super critical.

cdhowie · February 18, 2019, 9:57pm

We deal with this by having an offsite server that does not accept incoming connections at all. It uses rclone copy --immutable to pull new backups. copy (as opposed to sync) does not delete files in the destination, and --immutable does not accept any changes to any files. This effectively entirely disallows removal or sabotage by corrupting/deleting backups on the server; rclone will reject them.

We exclude locks and index and perform a restic prune after syncing to discard any incomplete packs so they can be recopied later (otherwise --immutable will reject them). Prune will also recreate the indexes.

Dj0k3 · February 18, 2019, 10:38pm

You can also have an external HDD to sync your repository. I have two repositories: local and offsite. Offsite holds very little data (just the necessary to start over again) and the local backup holds everything (about 160GiB / 2TiB in “restore-size”). So, I sync now and then my local backup to an external HDD that I just connect to do the sync. That way if everything fails at least I would have a copy of my local backup handy.

odin · February 18, 2019, 10:53pm

If your threat model involves a competent, active attacker directly targeting you, you’ve really gotten beyond any kind of simple model.

cdhowie · February 18, 2019, 11:35pm

Indeed, and defending against this in the backup system is not terribly difficult to do, though can be a bit tedious to keep up. Performing append-only copies of the online backups (what rsync copy --immutable does) to a server that does not allow remote access (services disabled, firewalled on-system and off-system by an intermediate router), particularly if those backups are never deleted. Additionally copying the off-site backup to offline storage (a set of HDDs that are brought to the off-site server, synced, and kept physically offsite in rotation) is generally sufficient to prevent malicious destruction of backups.

Adding in write-once storage media is an additional level of protection, if the data warrants the expense.

And, of course, regular manual testing of backups is a must.

odin · February 18, 2019, 11:57pm

My point, really, is that protecting against it is usually overkill, especially since confidentiality is often by far the greater risk. If you can spare the resources, though, go for it.

rsync · July 3, 2019, 8:18pm

@cdhowie, @odin

Given the discussion of rsync.net, above, I hope that it is useful to note that the ZFS snapshots of an rsync.net account are immutable from the perspective of an outside user (or attacker).

So, even without any interesting append-only scheme(s) or --immutable flags, simply configuring a schedule of server-side ZFS snapshots will create point in time backups that are immune to a malicious actor who compromises both your source server(s) and your cloud storage account.

You do, of course, have to configure those snapshots and you will, of course, have to notice the destruction before the snapshots are rotated away, but I think those are small hurdles to cross…