Restic client deletes data packs during backup command

Den · August 11, 2019, 3:06pm

I have multiple clients backing up to a restic rest-server running in append-only mode. For various restic clients running on different operating systems I have noticed that during a backup a client sometimes tries to delete a data blob. Running in append-only returns a 403 and eventually the client succeeds in backing up but sometimes it extends the backup time (like restic is trying to repeatedly delete a blob of data).

My question is: I understand that locks need to be deleted (and these are explicitly allowed when running in append-only), but why does the restic client need to delete data packs during a backup? Shouldn’t this be reserved only for a prune command?

Den · August 11, 2019, 3:44pm

Looking at my logs again, this problem has only happened twice (on two separate clients). The first client randomly tried to delete a blob during backup, got a 403 (because append-only), and eventually tried posting the blob again and succeeded. Snapshot complete, no infinite loop.

Further investigation on my second client shows something else. It tried posting two data blobs. The first one, let’s call it x, succeeded with 200. The second one, y, failed with 502 (possibly restic-server crashed so proxy returns a 502?).

Following the 502, the client ran in an infinite loop as follows and crashed: (1) Attempt to delete blob x. (2) Attempt to post blob x again.

Perhaps the restic client, after receiving 502 on blob y, thought that blob x wasn’t posted and tried to post it again (perhaps safety feature? restic failed blob y so wants to make sure x is there? which is weird because client should know x is there as it got a 200).

Comments on (2): Looking at the restic-server code, the client may have ran into this: https://github.com/restic/rest-server/blob/a87d968870b4dec64abfe1270e538fdc01524a81/handlers.go#L519-L521. The code here says that if the blob/data already exists on the server, then return 403.

Anyways I’m not sure if this is a problem with the restic server or client. One fix I think should go in is changing the code on the server to not return a 403 if the data blob already exists on the server. Instead return a 200. Perhaps then if the client ever runs into this again it won’t go in this endless loop of deleting and posting as it will see the data finally got posted successfully.

cdhowie · August 11, 2019, 5:31pm

Yes, 502 indicates that the HTTP request was received, but that an attempt to forward the request to another service failed.

This is definitely strange. It’s especially odd to me that the client tries to delete blob x before re-uploading it. I wouldn’t expect this step to be necessary as basically all storage backends will allow you to upload over an existing pack (unless running against an append-only server, in which case the delete will also fail anyway).

This makes sense to me as long as the server verifies that the content matches. (It should, since packs are content-addressable – nevertheless it would be a good safety feature.)

Personally, I would like to see a bit more smarts in the rest server, such as validating the checksum of data matches the content that was sent. This would add an extra layer of integrity checking.