Connection errors during backup with restic + rclone

akrabu · February 26, 2021, 6:16pm

So I had freshly done a prune and a check --read-data with no errors. Decided to take a fresh backup last night. During the backup, a few little rclone connection errors would pop up. Maybe once an hour or so. Unfortunately I didn’t get to screenshot the connection errors, but after every “retry” rclone issued while restic was backing up, there was no ultimate “failure”. It just said “retry after ~500ms” and then it would continue. It would retry just once, and seemingly work. After some time, it might hiccup again, with a quick retry, then keep going like it was okay.

HOWEVER, it was indeed messing things up. I was curious and decided to do a quick check, just to be safe. Here’s what I got, almost immediately:

It seems as though every time restic+rclone threw a “retry” it was saving an incomplete pack. The backup ran late into the night, and unfortunately I forgot it was a script and would exit when finished (clearing the log). I didn’t get a screenshot. But I meticulously checked what was already on the screen and it would be a different file each time, and each error would only output one message, with a retry of less than 1s, and then immediately pick up and keep going.

So to try to fix the problem, I did a rebuild-index - and let me just say WOW, that is SO much faster than it used to be, I love you guys haha. I am now backing up again, to fill in any missing blobs, will do a check --read-data, and then find anything missing and forget it, with one final check --read-data just to be safe.

That said, I’m really concerned about the fragility of the backup process, if a few connection retries suddenly have incomplete packs stored. I can’t guarantee it, but I feel like this is a restic+rclone thing, not restic by itself

Here’s my backup so far, filling in the missing information. I will say the messages it prints are a little… scrambled. Thought I’d point that out too. For instance “storing the file againcations/Install macOS Big Sur.app/Contents/SharedSupport/SharedSupport.dmg”

These are files very early on in the backup process, which I remember seeing a quick “retry” for while I was had eyes on the backup. Just a simple one-time retry. But apparently it was storing partial packs.

Anyway, I hope this makes sense to someone and can help find a potential bug! I’m wondering if this happens just with rclone as a backend. Or I’m wondering if pCloud doesn’t “resume” a pack properly. That said, I’ve never had these issues syncing up to 800GB to pCloud with “rclone sync” (not using Restic). I might get a hiccup or two, but if I test the file integrity later, everything is safe (checksums). This includes syncing a large restic repository. I’ve done a restic check --read-data and it checks out. It seems like it’s the specific combo of restic using rclone as a backend, from what I can tell.

I should mention one modification I’ve made is to use transfers=2 and checkers=2 for rclone in my scripts. Seems to cut down on the amount of these errors, but they still happen after about an hour or so. Only one at a time when it happens, though. Not a whole screenful. Just a “hiccup”.

akrabu · February 26, 2021, 6:52pm

Okay. Backup just finished. One thing slightly odd I noticed was this:

[33:37] 284112 files 87.211 GiB, total 284238 files 87.209 GiB, 229 errors

I’m assuming this is just because a file ended up being bigger than what restic originally calculated, but I thought I’d point it out just in case.

Also I received ONE error during the backup:

GET request (wrote 277936/4247041 bytes): read tcp 10.228.18.206:59753->74.120.8.225:443: read: operation timed out
Load(<data/f2b41905a5>, 0, 0) returned error, retrying after 720.254544ms: Copy: unexpected EOF

Here is the output:

Last login: Fri Feb 26 08:49:20 on ttys001
/Users/akrabu/Scripts/akrabu-backup.command ; exit;                             
akrabu@akrabu-macbook-air ~ % /Users/akrabu/Scripts/akrabu-backup.command ; exit;

Starting in 30 seconds, or press [Enter] key to start now...

Checking latest version, please wait...

Updating Homebrew...
==> Auto-updated Homebrew!
Updated 2 taps (homebrew/cask-versions and homebrew/core).
==> Updated Formulae
Updated 7 formulae.
==> Updated Casks
Updated 2 casks.

Warning: restic 0.12.0 already installed
Warning: rclone 1.54.0 already installed

Exporting lists, please wait...

Backing up, please wait...

repository 24344e18 opened successfully, password is correct
using parent snapshot 81edc328
error: parts of /Applications/Dropbox.app/Contents/Frameworks/DropboxCore.framework/Versions/A/DropboxCore not found in the repository index; storing the file again Client.app/Contents/lib/rt.jar
error: parts of /Applications/Dropbox.app/Contents/Frameworks/Tungsten.framework/Versions/A/Frameworks/Chromium Embedded Framework.framework/Versions/A/Chromium Embedded Framework not found in the repository index; storing the file again
error: parts of /Applications/Dropbox.app/Contents/Frameworks/Tungsten.framework/Versions/A/Frameworks/Dropbox Web Helper.app/Contents/MacOS/Dropbox Web Helper not found in the repository index; storing the file again/Chromium Embedded Framework.framework/Vers
error: parts of /Applications/Dropbox.app/Contents/Frameworks/Tungsten.framework/Versions/A/Tungsten not found in the repository index; storing the file againontents/Frameworks/Tungsten.framework/Versions/A/Frameworks/Chromium Embedded Framework.framework/Vers
error: parts of /Applications/Dropbox.app/Contents/Frameworks/libdropbox_sqlite_ext.dylib not found in the repository index; storing the file againopbox.app/Contents/Frameworks/Tungsten.framework/Versions/A/Frameworks/Chromium Embedded Framework.framework/Vers
error: parts of /Applications/Dropbox.app/Contents/Frameworks/libdropbox_watchdog.dylib not found in the repository index; storing the file againDropbox.app/Contents/Frameworks/Tungsten.framework/Versions/A/Frameworks/Chromium Embedded Framework.framework/Vers

#truncated

error: parts of /Users/akrabu/Library/Application Support/MobileSync/Backup/5bb9f02642a790431abee2379dbb9d2bc8df2254/fd/fda88483bef0720a4ef89dfec37d4746cadb6650 not found in the repository index; storing the file againt/MobileSync/Backup/5bb9f02642a790431abee2379dbb9d2bc8df2254/fd/fd9b435a36e7c61ef6f1cba264679fba6aa1eeff
rclone: 2021/02/26 10:37:48 ERROR : data/f2/f2b41905a500001f17fb5c2ee7afafe3fecc7bdf50fe30ef7bd721c3436f1c19: Didn't finish writing GET request (wrote 277936/4247041 bytes): read tcp 10.228.18.206:59753->74.120.8.225:443: read: operation timed out
Load(<data/f2b41905a5>, 0, 0) returned error, retrying after 720.254544ms: Copy: unexpected EOF

Files:          21 new,   757 changed, 283460 unmodified
Dirs:            1 new,   375 changed, 58553 unmodified
Added to the repo: 421.056 MiB

processed 284238 files, 87.214 GiB in 34:08
snapshot ccb97647 saved
Warning: failed to read all source data during backup

Done!

Closing in 30 seconds, or press [Enter] key to exit now...

Full log: https://f002.backblazeb2.com/file/akrabu-365/restic.txt

Going to do a quick check and go from there. If there’s errors, I’ll try a rebuild-index and another backup. If that doesn’t work, I’ll use the find command and delete whatever’s missing. I’ll check back in here when a check --read-data confirms I’ve fixed it. Will probably be a day or two, my repo is 900GB. Luckily my connection speed to pCloud is about 40-60MB/s at least!

cdhowie · February 27, 2021, 6:28am

This sounds possibly like a defect in the pCloud API or in rclone’s implementation of it. Usually when uploading to object storage, you include the length in bytes and a checksum of the data. If the connection is interrupted for whatever reason, the server should notice that the content length and/or checksum do not match and discard the data instead of storing an incomplete file. And then I would expect the retry to overwrite the data.

However, the following sequence of events could explain this, in combination with pCloud storing incomplete files.

Upload attempt 1 is initiated.
The connection to the server becomes closed/invalidated from the perspective of rclone, but not the pCloud server. This could happen by:
- The client having a shorter TCP timeout than the server, and packet loss for a long enough period that the client times out its side of the connection but the server does not. (This is IMO the most likely scenario.)
- An intermediate router sending a TCP RST to the client, but not to the server.
Upload attempt 2 is initiated.
Upload attempt 2 completes and the pCloud server commits the upload to storage.
The server eventually times out its side of upload 1’s connection. Because it doesn’t verify the length/checksum of the data, it commits the bad (partially-uploaded) data to storage, overwriting the prior-committed good upload.

I don’t have any direct evidence that this is what is happening, but it would explain all of the symptoms you are seeing.

alexweiss · February 27, 2021, 1:00pm

Just a side note: check without --read-data is also able to detect file size mismatch, which means it is especially able to detect files that are saved but truncated. This check was added to the 0.12.0 release.

akrabu · February 27, 2021, 5:02pm

So I came up with an idea to test this. I made a test repo on B2, generated two 20GB random files, and am having Restic back up one using the native B2 backend, then back up the second one using Rclone as a B2 backend.

My home connection is fairly slow, which is where I’ve been backing up from (stupid Xfinity). I run check --read-data on a machine with a fiber connection elsewhere. So far the first round of the test backup has been going for 9 hours with the native backend and not a single hiccup. It’s at 96% so far. I’ll be doing a regular check afterward, then backing up the second random file using the Rclone B2 backend, and see if there’s any connection hiccups, then do another check and report back.

Am curious to see what check reports, and if I can manage to get the rclone b2 backup to hiccup as well, and which, if not both, will report errors with check.

As far as my main repo goes, it’s at 52% with check --read-data and just has this so far:

Load(<data/e9403e1c4d>, 0, 0) returned error, retrying after 720.254544ms: <data/e9403e1c4d> does not exist
rclone: 2021/02/27 06:38:56 ERROR : data/e3/e37e41067262a58b32228aa4dd23a2f43e7ca20c36f9f3bccf130682134d082e: Didn’t finish writing GET request (wrote 586592/4211386 bytes): unexpected EOF
Load(<data/e37e410672>, 0, 0) returned error, retrying after 582.280027ms: unexpected EOF

Which is an improvement over how many errors there were before, and I’m confident I can fix it. Depending on the results of my test repo, I may be moving back to B2 and filing some bug reports to pCloud. We shall see!

@alexweiss And thanks! That makes the test I’m doing much quicker. I did a regular check on my main repo too after a rebuild-index and a backup to fill in any missing blobs, and it checked out at first. Doing the --read-data now just to make doubly sure, and apparently that was a good idea haha

akrabu · February 27, 2021, 5:35pm

So the backup finished, not a SINGLE connection error. Check reported no errors. THEN the Rclone:B2 backup started, and right out the gate I have a TON of errors:

repository adfc72e8 opened successfully, password is correct
no parent snapshot found, will read all files
rclone: 2021/02/27 09:13:56 ERROR : data/26/267fa3d35857847c526fedb393286dd3694140ebb79151891aae170fc8f91a5a: Post request put error: Post "https://pod-000-1149-02.backblaze.com/b2api/v1/b2_upload_file/26a903e281e1323b6fb5081c/c002_v0001149_t0040": EOF
rclone: 2021/02/27 09:13:56 ERROR : data/c7/c764ce1cb76224ff2851cd0af2932374802fc00c99e7b5e54ba873f94ac66506: Post request put error: Post "https://pod-000-1139-04.backblaze.com/b2api/v1/b2_upload_file/26a903e281e1323b6fb5081c/c002_v0001139_t0059": EOF
rclone: 2021/02/27 09:13:57 ERROR : data/c7/c764ce1cb76224ff2851cd0af2932374802fc00c99e7b5e54ba873f94ac66506: Post request rcat error: Post "https://pod-000-1139-04.backblaze.com/b2api/v1/b2_upload_file/26a903e281e1323b6fb5081c/c002_v0001139_t0059": EOF
rclone: 2021/02/27 09:13:57 ERROR : data/26/267fa3d35857847c526fedb393286dd3694140ebb79151891aae170fc8f91a5a: Post request rcat error: Post "https://pod-000-1149-02.backblaze.com/b2api/v1/b2_upload_file/26a903e281e1323b6fb5081c/c002_v0001149_t0040": EOF
Save(<data/c764ce1cb7>) returned error, retrying after 720.254544ms: server response unexpected: 500 Internal Server Error (500)
Save(<data/267fa3d358>) returned error, retrying after 582.280027ms: server response unexpected: 500 Internal Server Error (500)

#truncated

Fatal: unable to save snapshot: server response unexpected: 500 Internal Server Error (500)

Full output: https://f002.backblazeb2.com/file/akrabu-365/restic-test.txt

It then terminated the backup. Check just reports unreferenced packs. So, no corruption at least.

Going to try a --read-data with the native B2 backend, and with the Rclone:B2 backend. Then will try a backup once more.

So far it looks like the problem is two-fold. There’s a ton of EOF errors using Rclone:B2 and Rclone:pCloud both, and also, pCloud saves the truncated files instead of ditching them like B2, which results in corruption. At least so far the Rclone:B2 backup just resulted in some orphaned pack files.

MichaelEischer · February 28, 2021, 12:24pm

As far as I remember pCloud stores multiple revisions of files. The last time I’ve seen incomplete uploads on pCloud it managed to store the correct upload as an old (!) file version.

The pCloud API unfortunately only provides file checksums after the upload, so these can’t be used to avoid incomplete uploads. restic and in turn rclone provide the ContentLength header while uploading files, which makes it entirely pCloud’s fault to keep these incomplete uploads. Along with confusing old and new file versions. rclone actually tries to remove failed uploads:

github.com

rclone/rclone/blob/9cc8ff4dd48dedcaf4bab25b94985758d3ad9c86/backend/pcloud/pcloud.go#L1157


      
          		opts.Parameters = nil
          		opts.ContentLength = &contentLength
          	}
          
          	err = o.fs.pacer.CallNoRetry(func() (bool, error) {
          		resp, err = o.fs.srv.CallJSON(ctx, &opts, nil, &result)
          		err = result.Error.Update(err)
          		return shouldRetry(resp, err)
          	})
          	if err != nil {
          		// sometimes pcloud leaves a half complete file on
          		// error, so delete it if it exists
          		delObj, delErr := o.fs.NewObject(ctx, o.remote)
          		if delErr == nil && delObj != nil {
          			_ = delObj.Remove(ctx)
          		}
          		return err
          	}
          	if len(result.Items) != 1 {
          		return errors.Errorf("failed to upload %v - not sure why", o)
          	}

This is yet another bug in pCloud. The server temporarily claims that a file doesn’t exist. And then a second later it does…

Here the download from pCloud seems to have been interrupted. rclone then notifies restic about the problem. As rclone can’t return a detailed error message to restic at that point, it just prints a warning on the console (see restic serve: Didn't finish writing GET request · Issue #2598 · rclone/rclone · GitHub). After the retry everything should be fine.

B2 often requires several retries to complete an upload. The 500 error indicates a problem on the server-side. So maybe you just tried to upload at the wrong point in time?

Any upload attempt which runs out of retries fails the whole backup. restic first writes pack files, then the index files and finally the snapshot. That way it is possible to interrupt a backup at an arbitrary point without causing repository corruption. That is true as long as the backend only reports an upload as successful when that really was the case.

akrabu · February 28, 2021, 5:18pm

Okay, the testing has finally completed.

So, to recap, I made two 20GB random files. The first one I synced with B2 using restic’s native backend. The second one I attempted to sync using the Rclone:B2 backend. The latter did not go well. Both backup and check --read-data using the native backend worked flawlessly. Not a single error. I attempted backup twice with the Rclone:B2 backend, with both ending in fatal errors. (Meanwhile, yes, had I been using pCloud, it would have allowed the backup to continue, saving partial files, and corrupting the database.)

After the first failed Rclone:B2 backup, I ran a check --read-data using both backends. Here’s the native backend:

using temporary cache in /var/folders/lj/z4b97q0n1wb64qrrb0rnnrbw0000gn/T/restic-check-cache-822438630
repository adfc72e8 opened successfully, password is correct
created new cache in /var/folders/lj/z4b97q0n1wb64qrrb0rnnrbw0000gn/T/restic-check-cache-822438630
create exclusive lock for repository
load indexes
check all packs
pack c934a697: not referenced in any index
pack 55297fe7: not referenced in any index
pack 48ddc81d: not referenced in any index
pack d0e4cf3c: not referenced in any index
pack 0d4777e1: not referenced in any index
pack d5cb63fd: not referenced in any index
pack 801484c8: not referenced in any index
pack edd3286d: not referenced in any index
8 additional files were found in the repo, which likely contain duplicate data.
You can run `restic prune` to correct this.
check snapshots, trees and blobs
no errors were found

Here’s the check --read-data using the Rclone:B2 backend:

using temporary cache in /var/folders/lj/z4b97q0n1wb64qrrb0rnnrbw0000gn/T/restic-check-cache-398257619
repository adfc72e8 opened successfully, password is correct
created new cache in /var/folders/lj/z4b97q0n1wb64qrrb0rnnrbw0000gn/T/restic-check-cache-398257619
create exclusive lock for repository
load indexes
check all packs
pack 801484c8: not referenced in any index
pack c934a697: not referenced in any index
pack 55297fe7: not referenced in any index
pack 0d4777e1: not referenced in any index
pack edd3286d: not referenced in any index
pack d5cb63fd: not referenced in any index
pack d0e4cf3c: not referenced in any index
pack 48ddc81d: not referenced in any index
8 additional files were found in the repo, which likely contain duplicate data.
You can run `restic prune` to correct this.
check snapshots, trees and blobs
read all data 0 / 1 snapshots
[0:00] 100.00%  1 / 1 snapshots
rclone: 2021/02/27 09:37:28 ERROR : data/28/2821f792815356b6ee703237e7348505b63e42a605a1c06fc7aad81f18ef552c: Didn't finish writing GET request (wrote 6335700/7306072 bytes): unexpected EOF
Load(<data/2821f79281>, 0, 0) returned error, retrying after 720.254544ms: unexpected EOF
[1:37] 1.31%  49 / 3747 packs
rclone: 2021/02/27 10:08:42 ERROR : data/bf/bf46c9e552224b1853af5ee4acc8d9cb24b3e27c7b9cb3f0487d12b3ba467cc8: Didn't finish writing GET request (wrote 4019540/7499688 bytes): read tcp 10.236.50.252:62214->206.190.215.16:443: read: operation timed out
Load(<data/bf46c9e552>, 0, 0) returned error, retrying after 582.280027ms: unexpected EOF
rclone: 2021/02/27 10:08:42 ERROR : data/80/80ad89758519457db65770bf592216c03d2aa3e3027ddc1dabfd09f2dc920746: Didn't finish writing GET request (wrote 992165/8014898 bytes): read tcp 10.236.50.252:62271->206.190.215.16:443: read: operation timed out
Load(<data/80ad897585>, 0, 0) returned error, retrying after 468.857094ms: unexpected EOF
rclone: 2021/02/27 10:08:43 ERROR : data/86/86ce8b02adb81b29c028a6cbc7e597aefcb1eb5620d3c26abdff41bb6d23af25: Didn't finish writing GET request (wrote 4367141/4940815 bytes): read tcp 10.236.50.252:62257->206.190.215.16:443: read: operation timed out
Load(<data/86ce8b02ad>, 0, 0) returned error, retrying after 462.318748ms: unexpected EOF
rclone: 2021/02/27 10:08:43 ERROR : data/4d/4db081ec4ae9695558e332e29e1fd7c4d6eb0e2f744b18f1ac3867b4d1340895: Didn't finish writing GET request (wrote 2431316/5142380 bytes): read tcp 10.236.50.252:62133->206.190.215.16:443: read: operation timed out
Load(<data/4db081ec4a>, 0, 0) returned error, retrying after 593.411537ms: unexpected EOF
rclone: 2021/02/27 10:08:43 ERROR : data/97/975d0fb57f341bef9390912bf65a84fa4e9d1f1fe7b479381bd6cab749fa2a74: Didn't finish writing GET request (wrote 6683301/8892480 bytes): read tcp 10.236.50.252:62248->206.190.215.16:443: read: operation timed out
Load(<data/975d0fb57f>, 0, 0) returned error, retrying after 282.818509ms: unexpected EOF
[40:36] 34.59%  1296 / 3747 packs
rclone: 2021/02/27 10:17:35 ERROR : data/d5/d51e730af8a7bd398101d365bc89bd5afc80cc1925eb2d23d70ba5f04f9dd5a9: Didn't finish writing GET request (wrote 3821012/4424979 bytes): unexpected EOF
Load(<data/d51e730af8>, 0, 0) returned error, retrying after 328.259627ms: unexpected EOF
rclone: 2021/02/27 10:26:52 ERROR : data/08/08e7f5f7a88e0777ebc851feb5ca44c757172c3e1ebe528e3b44cae8423160dc: Didn't finish writing GET request (wrote 2960724/4559876 bytes): unexpected EOF
Load(<data/08e7f5f7a8>, 0, 0) returned error, retrying after 298.484759ms: unexpected EOF
rclone: 2021/02/27 10:38:10 ERROR : data/e7/e7cdf9f4e8105e2572dc44cfd7445a524a5ec589f4918e23f90d0962d91405e7: Didn't finish writing GET request (wrote 1835732/5096181 bytes): unexpected EOF
Load(<data/e7cdf9f4e8>, 0, 0) returned error, retrying after 400.45593ms: unexpected EOF
rclone: 2021/02/27 10:39:17 ERROR : data/c2/c227101a7905c41349af53cd9cbc867a9e04678b96ce77e46266f820a8f98317: Didn't finish writing GET request (wrote 4946004/5872615 bytes): unexpected EOF
Load(<data/c227101a79>, 0, 0) returned error, retrying after 507.606314ms: unexpected EOF
rclone: 2021/02/27 10:42:54 ERROR : data/3e/3e415ff48de57a54437aa9d827d0d898ef4dd21ee55db1d094f830bd2b755475: Didn't finish writing GET request (wrote 3010533/4694518 bytes): unexpected EOF
Load(<data/3e415ff48d>, 0, 0) returned error, retrying after 656.819981ms: unexpected EOF
rclone: 2021/02/27 11:08:25 ERROR : data/8c/8cba80e4c70b44931291f4d2884cde268f67c1b5343e7705cfaf6265060bb134: Didn't finish writing GET request (wrote 942356/4618959 bytes): unexpected EOF
Load(<data/8cba80e4c7>, 0, 0) returned error, retrying after 357.131936ms: unexpected EOF
[1:50:43] 100.00%  3747 / 3747 packs
no errors were found

Afterward, I tried once more to backup using the Rclone:B2 backend. The second attempt failed, as well.

repository adfc72e8 opened successfully, password is correct
no parent snapshot found, will read all files
rclone: 2021/02/27 13:17:55 ERROR : data/53/53cf89ca022d0c9cb12b22b36c522229879c0b11808b407771a045816dc36457: Post request put error: Post "https://pod-000-1148-00.backblaze.com/b2api/v1/b2_upload_file/26a903e281e1323b6fb5081c/c002_v0001148_t0020": EOF
rclone: 2021/02/27 13:17:55 ERROR : data/bc/bcd082248181b27b02bda634c8cad3182bb2eef61a635997a7a01a06d883fe8c: Post request put error: Post "https://pod-000-1147-00.backblaze.com/b2api/v1/b2_upload_file/26a903e281e1323b6fb5081c/c002_v0001147_t0004": EOF
rclone: 2021/02/27 13:17:55 ERROR : data/53/53cf89ca022d0c9cb12b22b36c522229879c0b11808b407771a045816dc36457: Post request rcat error: Post "https://pod-000-1148-00.backblaze.com/b2api/v1/b2_upload_file/26a903e281e1323b6fb5081c/c002_v0001148_t0020": EOF
rclone: 2021/02/27 13:17:55 ERROR : data/ee/ee341443b62b9d15a9ae724304a5e008dda23cf81506849b36393ab4f672207c: Post request put error: Post "https://pod-000-1124-05.backblaze.com/b2api/v1/b2_upload_file/26a903e281e1323b6fb5081c/c002_v0001124_t0052": EOF
rclone: 2021/02/27 13:17:55 ERROR : data/bc/bcd082248181b27b02bda634c8cad3182bb2eef61a635997a7a01a06d883fe8c: Post request rcat error: Post "https://pod-000-1147-00.backblaze.com/b2api/v1/b2_upload_file/26a903e281e1323b6fb5081c/c002_v0001147_t0004": EOF
rclone: 2021/02/27 13:17:55 ERROR : data/ee/ee341443b62b9d15a9ae724304a5e008dda23cf81506849b36393ab4f672207c: Post request rcat error: Post "https://pod-000-1124-05.backblaze.com/b2api/v1/b2_upload_file/26a903e281e1323b6fb5081c/c002_v0001124_t0052": EOF
Save(<data/53cf89ca02>) returned error, retrying after 720.254544ms: server response unexpected: 500 Internal Server Error (500)
Save(<data/bcd0822481>) returned error, retrying after 582.280027ms: server response unexpected: 500 Internal Server Error (500)
Save(<data/ee341443b6>) returned error, retrying after 468.857094ms: server response unexpected: 500 Internal Server Error (500)

#truncated

Fatal: unable to save snapshot: server response unexpected: 500 Internal Server Error (500)

Full log: https://f002.backblazeb2.com/file/akrabu-365/restic-rclone-b2-backup.txt

Then I decided to use the native backend to backup the same, second random file.

repository adfc72e8 opened successfully, password is correct
no parent snapshot found, will read all files
[13:15] 2.84%  0 files 541.144 MiB, total 1 files 18.626 GiB, 0 errors ETA 7:33:46

Files:           1 new,     0 changed,     0 unmodified
Dirs:            4 new,     0 changed,     0 unmodified
Added to the repo: 18.569 GiB

processed 1 files, 18.626 GiB in 9:15:57
snapshot 6133b403 saved

Not a single hiccup or error. Lastly, I then used rclone by itself to copy the file:

rclone -P --stats-one-line copy /Users/akrabu/Desktop/test2/restic-rclone b2:akrabu-014
18.626G / 18.626 GBytes, 100%, 919.227 kBytes/s, ETA 0s

Not a single hiccup with Rclone alone, either - even though it took the same 9 hours to upload, over the same connection, to the same backend. I verified the checksum, as well.

So, I think the problem is two-fold. One, pCloud is prone to incomplete uploads as @cdhowie and @MichaelEischer pointed out. A backend like B2 will at least cause the backup to fail, instead of leading to a saved, corrupted snapshot as pCloud does. Two, something is going on with using Rclone as a backend with Restic. It seems much more unstable than either Restic or Rclone alone.

I think the moral of the story for me is, I’m going to need to switch back to a native backend. I wish Restic supported WebDAV, I’d be curious to see how pCloud would be handled natively instead.

MichaelEischer · March 4, 2021, 7:18pm

There must be a third factor that comes into play here, like specific rclone backends or changes in newer rclone versions. For me restic 0.10 (I’ve also successfully used some pre-0.12 experimental builds) + rclone 1.45 works in the following setup: restic connects via ssh to another server and spawns rclone there to save files in a local repository. That setup has work flawlessly (except for a few timeouts when starting rclone) for several repository including a multi-TB one. On the restic side not much has changed regarding the rclone/rest backend and I think I’ve had most of the recent changes already included in one of my experiments.

akrabu · March 5, 2021, 4:02pm

I’m running restic 0.12.0, and rclone 1.54.0. I posted back in December 2019 about similar issues, running whatever was the latest version of both back then (my script auto-updates both via homebrew before backing up). Back then I thought it was just pCloud, and just decided to only prune every 6 months or so (that seemed to be the worst offender). I just realized it’s been corrupting my backups as well (separate issue from restic+rclone, I now realize).

That said, testing restic+rclone with b2 as a backend results in the same connection issues and errors. Only difference is, b2 is smart enough to reject the partial uploads. But when I take rclone out of the picture, everything works fine.

I’ve moved three times and have used three different ISPs since December 2019. I’ve tried it on my office’s fiber connection with 80 MiB/s speeds. If Restic uses Rclone, it gets screwy. Native backends work flawlessly, even when I had a crappy DSL modem at one of my apartments and could only upload at 500-800 KiB/s.

I also know that if I backup to a local drive, and rclone it to either pCloud or B2, it still works flawlessly (even on that crappy DSL connection, though it might take all week). It’s the specific combination of using restic+rclone as a backend that is error prone for me.

I’ve messed with the rclone transfers and the checkers, setting them both all the way down to 1 - didn’t fix the problem.

Wish I knew enough to pour through the code and figure out where the problem lies, but alas, I’d have no idea where to start. I can just tell you where it works for me, and where it doesn’t.

For now I’m rclone syncing my repo from pCloud to B2, so at least I won’t have broken snapshots anymore. It sucks because I have a 4TB lifetime account with pCloud, and have multiple users backing up to my repo from around the US (so local backup + rclone sync won’t work for me). But at the end of the day I need a rock-solid backup, and for me that’s restic with a native backend, and thus far my favorite is B2. Just hate those egress charges for the occasional check --read-data for peace of mind haha

MichaelEischer · March 5, 2021, 5:32pm

I’ve just noticed that your previous test scenario of uploading a single large file tests a different backend usage pattern than used by restic. Please create lots of 4-5MB files (this corresponds to the average pack file size) to have a more accurate comparison. Although being able to use rclone alone to upload the files to the backend sounds like that won’t make much of a difference.

Did you set any restic options when using restic+rclone? Lowering -o rest.connections=5 to the number of connections used in rclone would be worth a try.