Connection to restic rest-server gets cut after three hours

Hi everyone. I am trying out restic. Looked good so far. Now I’ve encountered an error and I don’t know how to solve it.

Problem: connection to restic rest-server gets cut after about 3h, so it won’t complete the initial backup.

Setup: Running rest-server, V. 0.10.0 on a raspberry pi 4, 4GB, wired connection, Running Ubuntu 20.04.1 LTS. The Rest-Server is started with this systemd, running under a different user rests. The destination is a USB-HDD (with its own power supply).

[Unit]
Description=Rest Server
After=syslog.target
After=network.target

[Service]
Type=simple
User=rests
Group=rests
ExecStart=/usr/local/bin/rest-server --private-repos --append-only --path /home/rests/hdd
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Client: Windows 10,

Powershell Commannd:

.\restic_0.11.0_windows_amd64.exe -r rest:http://PRO:Password@192.168.1.3:8000/PRO -verbose backup D:\folder1 D:\folder2 D:\folder3 --exclude=‘Archive_OLD’ --password-file C:\pw

This setup works fine for a smaller dataset (250gb no trouble). However, i started with the real dataset now (~ 3TB) and the connection fails after 3h. I know that I can resume, but now I’ve reached a point where it takes almost 3hours to rescan the files, so there is no new data uploaded. I’m stuck.

This is what the client says:

Save(<data/2111ad2aee>) returned error, retrying after 552.330144ms: client.Post: Post “http://PRO:***@192.168.1.3:8000/PRO/data/2111ad2aee …”: readfrom tcp 192.168.1.4:55260->192.168.1.3:8000: write tcp 192.168.1.4:55260->192.168.1.3:8000: wsasend: An existing connection was forcibly closed by the remote host.

Save(<data/2111ad2aee >) returned error, retrying after 2.340488664s: server response unexpected: 403 Forbidden (403)

Save(<data/2111ad2aee >) returned error, retrying after 4.506218855s: server response unexpected: 403 Forbidden (403)

Save(<data/2111ad2aee >) returned error, retrying after 3.221479586s: server response unexpected: 403 Forbidden (403)

Save(<data/2111ad2aee >) returned error, retrying after 5.608623477s: server response unexpected: 403 Forbidden (403)

Save(<data/2111ad2aee >) returned error, retrying after 7.649837917s: server response unexpected: 403 Forbidden (403)

Save(<data/2111ad2aee >) returned error, retrying after 15.394871241s: server response unexpected: 403 Forbidden (403)

Fatal: unable to save snapshot: server response unexpected: 403 Forbidden (403) ETA 7:14:44

Any help is appreciated. Thanks!!

Does the rest-server report any warnings when the connection fails? The strange part here is that the rest-server seems to think the data file was successfully uploaded and the blocks later modifications. Could you check whether shasum -a256 data/21/2111ad2aee[...] reports the same hash as the filename? If that is the case then the upload was indeed complete, but restic never got a successful reply.

As a workaround for the initial backup you could temporarily remove the --append-only flag which should allow the retries to complete instead of failing with server response unexpected: 403 Forbidden (403).

Hi Michael,
thanks for replying!

Could you check whether shasum -a256 data/21/2111ad2aee[...] reports the same hash as the filename?

It actually didn’t create the file

Does the rest-server report any warnings when the connection fails?

Could you give me some pointers where to check that exactly?

As a workaround for the initial backup you could temporarily remove the --append-only flag

I will try and report back. Thank you

That’s rather unexpected. I don’t see how the rest-server could return a 403 error code when the file doesn’t exist. Is there really no file whose name starts with 2111ad2aee in the data/21 folder of the repository?

As you are runnning the rest-server with systemd, there could be some log entries in the syslog or just try journalctl -u rest-server.service (assuming the service is actually called rest-server.service) .

Hi Michael

That’s rather unexpected. I don’t see how the rest-server could return a 403 error code when the file doesn’t exist. Is there really no file whose name starts with 2111ad2aee in the data/21 folder of the repository?

I can confirm. 100% no such file in the appropriate folder.
Moreover, there was nothing suspicious in the logs.

I tried the process without --append-only and as you said, it worked. Thanks.
There was the same problem, but it continued after that:

Save(<data/b1200f8a58>) returned error, retrying.
This file b120… also doesnt extist in data/b1/

restic reports that it will retry the upload and as there are no further complaints that the upload has succeeded. So the file must be there or your repository is damaged. Please run restic check to check whether that is really the case. If check reports no error, then you’re looking in the wrong place.

Thanks for sticking with me!
I checked the repo and indeed there are errors

pack b1200f8a: does not exist
pack 6819d48b: does not exist
pack 71094213: does not exist
pack 211103dd: does not exist
pack fa1a1a17: does not exist
pack 3748c361: does not exist
pack afbd8aed: does not exist
pack 12022de6: does not exist
pack 3e30c7f3: does not exist
pack fd7b601d: does not exist
pack ee56a494: does not exist
pack 9934f9bf: does not exist
check snapshots, trees and blobs
Fatal: repository contains errors

That’s not good i guess. What would you recommend?

How can I find out what files are affected? or is this not possible? maybe i should active vss on windows restic client?

1 Like

pack b1200f8a: does not exist

That pack ID is also mentioned in one of the retries. Do you have any logs on the rest-server side which could provide some information on what is happening? Did restic report anything about the other pack IDs earlier on?

To repair the damage, the easiest way is probably to run rebuild-index (that will take quite some time), then run backup again and remove the old snapshot afterwards. However, I fear that this problem could resurface, so we should investigate anyways what’s going wrong here.

The rest-server supports a --log path-to-logfile option. Such a logfile would be very useful to further analyze this problem when it shows up again.

Thanks!
I will redo the entire backup with the new log file, then I’ll post the results.

The inital Backup Job has not completed yet. But the errors described previously persist:

Save(<data/7c6110b41c>) returned error, retrying after 552.330144ms: client.Post: Post "http://***:***@192.168.1.13:8000/USR1/data/7c6110b41cd62554398855675efbe668336ba41e54d70d2cd61c9f227fac865f": readfrom tcp 192.168.1.14:52331->192.168.1.13:8000: write tcp 192.168.1.14:52331->192.168.1.13:8000: wsasend: An existing connection was forcibly closed by the remote host.

The same error for:

(<data/6bd94ae58b>) 
(<data/0f85b5d035>) 
(<data/102f0023c2>)
(<data/5e7de1059c>)
(<data/27b70220a8>) 
(<data/80f43e05fd>)

ALL the logs that contains those hashes: For referece, the line number in brackets.

> 192.168.1.14 - - [30/Jan/2021:00:19:12 +0000] "DELETE /USR1/data/6bd94ae58b03256abf32a0165b7d25dd130a1c9b0e4ca852a9fd4003333eb358 HTTP/1.1" 200 0 "" "Go-http-client/1.1" (138698)
> 192.168.1.14 - - [30/Jan/2021:00:19:13 +0000] "POST /USR1/data/6bd94ae58b03256abf32a0165b7d25dd130a1c9b0e4ca852a9fd4003333eb358 HTTP/1.1" 200 0 "" "Go-http-client/1.1" (138711)
> 192.168.1.14 - - [30/Jan/2021:00:18:47 +0000] "POST /USR1/data/6bd94ae58b03256abf32a0165b7d25dd130a1c9b0e4ca852a9fd4003333eb358 HTTP/1.1" 400 12 "" "Go-http-client/1.1" (140141)

Interestingly, that’s the order in the rest.log. The Entry with timestamp 00:18:47 is on line 140141, compared to 138698(00:19:12) and 138711 (00:19:13)

> 192.168.1.14 - - [30/Jan/2021:00:59:11 +0000] "DELETE /USR1/data/0f85b5d03506df33d7c9c32182357c0500a83bab6e748bb27888ecdd3d4092d5 HTTP/1.1" 404 10 "" "Go-http-client/1.1" (159139)
> 192.168.1.14 - - [30/Jan/2021:00:59:12 +0000] "POST /USR1/data/0f85b5d03506df33d7c9c32182357c0500a83bab6e748bb27888ecdd3d4092d5 HTTP/1.1" 200 0 "" "Go-http-client/1.1" (159148)

> 192.168.1.14 - - [30/Jan/2021:02:19:21 +0000] "DELETE /USR1/data/102f0023c28774b4c5ad089a86ed3d52f27d69be71d11e91e4fb1a306a24c273 HTTP/1.1" 200 0 "" "Go-http-client/1.1" (197042)
> 192.168.1.14 - - [30/Jan/2021:02:19:21 +0000] "POST /USR1/data/102f0023c28774b4c5ad089a86ed3d52f27d69be71d11e91e4fb1a306a24c273 HTTP/1.1" 200 0 "" "Go-http-client/1.1" (197047)
> 192.168.1.14 - - [30/Jan/2021:02:19:00 +0000] "POST /USR1/data/102f0023c28774b4c5ad089a86ed3d52f27d69be71d11e91e4fb1a306a24c273 HTTP/1.1" 400 12 "" "Go-http-client/1.1" (198050)

> 192.168.1.14 - - [30/Jan/2021:02:25:00 +0000] "DELETE /USR1/data/5e7de1059c3b8327ad19da4cb66b19f5ec754398a9befca53af8f23f56e0638f HTTP/1.1" 200 0 "" "Go-http-client/1.1" (199539)
> 192.168.1.14 - - [30/Jan/2021:02:25:00 +0000] "POST /USR1/data/5e7de1059c3b8327ad19da4cb66b19f5ec754398a9befca53af8f23f56e0638f HTTP/1.1" 200 0 "" "Go-http-client/1.1" (199544)
> 192.168.1.14 - - [30/Jan/2021:02:24:27 +0000] "POST /USR1/data/5e7de1059c3b8327ad19da4cb66b19f5ec754398a9befca53af8f23f56e0638f HTTP/1.1" 400 12 "" "Go-http-client/1.1" (200360)

> 192.168.1.14 - - [30/Jan/2021:03:56:18 +0000] "DELETE /USR1/data/27b70220a8b03aacb0c9a6958a41bd7bfde10e8defd3dfbe8c7845b53d38f40f HTTP/1.1" 200 0 "" "Go-http-client/1.1" (244560)
> 192.168.1.14 - - [30/Jan/2021:03:56:19 +0000] "POST /USR1/data/27b70220a8b03aacb0c9a6958a41bd7bfde10e8defd3dfbe8c7845b53d38f40f HTTP/1.1" 200 0 "" "Go-http-client/1.1" (244566)
> 192.168.1.14 - - [30/Jan/2021:03:55:52 +0000] “POST /USR1/data/27b70220a8b03aacb0c9a6958a41bd7bfde10e8defd3dfbe8c7845b53d38f40f HTTP/1.1” 400 12 “” “Go-http-client/1.1” (245832)
>
> 192.168.1.14 - - [30/Jan/2021:03:56:18 +0000] "DELETE /USR1/data/80f43e05fd5365c4464cd4c33c39ec2633df6e3eff353bbe68083ef1dcc6546b HTTP/1.1" 200 0 "" "Go-http-client/1.1" (244561)
> 192.168.1.14 - - [30/Jan/2021:03:56:19 +0000] "POST /USR1/data/80f43e05fd5365c4464cd4c33c39ec2633df6e3eff353bbe68083ef1dcc6546b HTTP/1.1" 200 0 "" "Go-http-client/1.1" (244567)
> 192.168.1.14 - - [30/Jan/2021:03:55:53 +0000] "POST /USR1/data/80f43e05fd5365c4464cd4c33c39ec2633df6e3eff353bbe68083ef1dcc6546b HTTP/1.1" 400 12 "" "Go-http-client/1.1" (245831)

Thanks a lot for log snippets! This should be fixed by Atomic file upload and directory sync by MichaelEischer · Pull Request #142 · restic/rest-server · GitHub . Could you give that PR a try? (Feel free to ask if you need help with building the rest-server).

What we see here is the following: restic starts the first upload try at 03:55:53 (the last line) and notices the broken connection after approx. 25 seconds. Now restic deletes the failed upload (03:56:18) and reports wsasend: An existing connection was forcibly closed by the remote host.. That is immediately followed by the upload retry at 03:56:19. Notice that I’ve written “restic […] notices the broken connection”, I haven’t mentioned the rest-server! And that leads to what’s going wrong: The rest-server notices the broken connection only after a longer delay. So from the perspective of the rest-server things happened as follows: the first upload is started, and the nothing happens for some time. In parallel the rest-server receives a delete request and the second upload retry. Both operations are logged as soon as they were finished. And only now does the rest-server learn that connection of the first upload try is broken. Unfortunately it tries to clean up the remainders of the failed upload, but in that process it actually deletes the file uploaded by the successful second try. Then the failed upload is logged, which causes the out-of-order log entry. And now we’re missing a file :frowning: .

1 Like

@dasbackup As discussed in Data blobs seem to be missing, aborting prune to prevent further data loss! · Issue #3239 · restic/restic · GitHub and as demonstrated by your initial post, --append-only prevents the missing pack files problem and can only cause backup runs to fail.

As a workaround for now you could do the following: Complete the initial backup without --append-only, then run rebuild-index (using the current master branch / a beta build that should only take a few minutes: restic beta releases (/) ). Afterwards enable --append-only and start another backup run, restic should be able to reuse information from the previous run, but will still recover missing blobs. Then to complete delete the initial snapshot: either look up the snapshot name and manually delete the file on the server, or temporarily disable --append-only to delete the snapshots using forget.

That still leaves the question what causes the network connection between restic and the rest-server to break after a few hours. As the backup runs over night, could that be caused by e.g. the internet connection reconnecting at night?

Dear MichaelEischer,
I really appreciate your help. I indeed would need help to build your PR. I’m running Ubuntu 20.04.1 LTS on a Raspi 4, 4GB. If you could help me with the build commands to compile rest-server, i am happy to do so.

That still leaves the question what causes the network connection between restic and the rest-server to break after a few hours. As the backup runs over night, could that be caused by e.g. the internet connection reconnecting at night?

I can rule that out. It runs on a local network, wired connection. I suspect that the rPi4 might be not the right machine to carry out that task. But it’s the only thing available to me.

Build steps:

  • Get the go compiler from Downloads - The Go Programming Language or use sudo apt install golang-1.14 to install it using the package manager. Add it to the path: export PATH="$PATH:/usr/lib/go-1.14/bin"
  • Clone the PR: git clone -b atomic-upload https://github.com/MichaelEischer/rest-server.git (once the PR is merged you can get it using git clone https://github.com/restic/rest-server.git.
  • Build it using cd rest-server && go build ./cmd/rest-server. Then you should have a new rest-server binary in the current directory.

A rPI4 should be far more than capable enough to handle the rest-server. Do the interrupted network connections happen at totally different times or is it always roughly the same time?