Restic to backblaze stopped working here

freelsjd · August 25, 2022, 1:23am

I have a linux/debian/bullseye/11.4 system with restic 0.13.1 compiled with go1.18 on linux/amd64 installed. I am also using backblaze b2 to backup and store to the cloud about 1 TB of data from this system. I am setup to backup by filesystems at the rate of two per night for a total of 14 backups over the week. Each filesystem backup is stored as a separate folder underneath a backblaze bucket. It has been working flawlessly now for several weeks, and really works well; much better than trying to backup to dropbox or google-drive as I was doing earlier when I first started using restic.

Now, all of a sudden, in the middle of a backup last night (about 4am, 8/23/22), it stopped working. Now, I can verify that none of my backup scripts are working at all. I have narrowed it down to the restic command itself. It just hangs, and nothing happens. I have the verbose level at the max of 3, but still nothing comes out to the screen. It appears to login to backblaze, because if I change the backblaze key ID to something different, it gives me a message. I can also run a separate backblaze b2 script (b2 program compiled from source for linux provided by backblaze) it works with the same keys. Indeed, I also run a daily home directory sync using this b2 program with the sync option, and it is still working fine. So, I know it is something going on with restic alone, not connecting to backblaze at the moment.

I should also point out that I use similar scripts to backup to a local usb drive first. That is working fine and verifies with check fine afterwords. The restic-2-backblaze is just a copy of the usb local backup. So, restic locally is working fine to the usb drive as usual. This is uniquely an issue with restic to backblaze.

I have read that similar problems have occurred with ca certificates going out of date suddenly, and perhaps that is it. If so, maybe someone else has the same problem and a fix or workaround.

Otherwise, I am at a complete loss here at the moment. Any help apprreciated.

Thanks.

An

rawtaz · August 25, 2022, 1:50am

Well, unless you changed the restic binary, it will do the same thing it’s done all the time, so presumably something outside of restic changed and started causing you this problem.

That said, there’s been a few cases with “stalled” B2 connectivity lately, and we think that swapping the library used in restic for B2 connectivity might help, but it’s not something you do over night and it’s not certain there’s much ROI in doing so (as per the below).

There’s the possibility of using B2’s S3 API instead of their regular API, would you consider giving that a shot? Restic fully supports S3 so it should be fine to use that, assuming B2 do what they should be doing.

If that’s of any interest, see this PR for comments on the matter by @MichaelEischer: doc: recommend usage of B2's S3 API by MichaelEischer · Pull Request #3886 · restic/restic · GitHub

The text here describes what you’d need to do to start using B2 with S3: doc: recommend usage of B2's S3 API by MichaelEischer · Pull Request #3886 · restic/restic · GitHub

Loxley · August 25, 2022, 5:44am

My daily backup to B2 worked flowlessly 3 hours ago.

freelsjd · August 25, 2022, 5:57pm

Loxley, what linux distribution are you using and what version ? Thanks.

freelsjd · August 25, 2022, 6:00pm

rawtaz, thanks for your response. I also got some feedback from backblaze tech support. They did recommend verifying certs are syncced. Unfortunately, that is one area where I am completely lacking. I just know they are ascii text files stored in a specific location. How can I verify and then update if necessary on the restic end ?

freelsjd · August 25, 2022, 6:41pm

I have confirmed now that it must be something wrong with my installation. I have a son who uses the same version of debian and also backs up to backblaze using restic, and he has no issues right now. It must have been something I installed or configured since the last time it worked correctly that is causing this problem. I probably broke restic somehow.

MichaelEischer · August 25, 2022, 7:14pm

To have correct certificates on debian it should be enough to have the package ca-certificates installed. We’ve also seems issue due to some DNS filter blocking some of the backblaze domains. What does curl -v https://f002.backblazeb2.com or host f002.backblazeb2.com return?

Other than that you could update to restic 0.14.0 and generate a debug log by setting the environment variable DEBUG_LOG=logfile.log then look for lines with debug.loggingRoundTripper.RoundTrip and probably RoundTrip() returned error.

freelsjd · August 25, 2022, 11:43pm

Thank you MichaelEischer. I can only guess the interpretation of the output below. It seems OK for the most part, but there is a snippet message about

“ALPN, server did not agree to a protocol”

then later

“Mark bundle as not supporting multiuse”

What does this mean ?
curl -v https://f002.backblazeb2.com

Trying 206.190.215.16:443…
Connected to f002.backblazeb2.com (206.190.215.16) port 443 (#0)
ALPN, offering h2
ALPN, offering http/1.1
successfully set certificate verify locations:
CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
TLSv1.3 (OUT), TLS handshake, Client hello (1):
TLSv1.3 (IN), TLS handshake, Server hello (2):
TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
TLSv1.3 (IN), TLS handshake, Certificate (11):
TLSv1.3 (IN), TLS handshake, CERT verify (15):
TLSv1.3 (IN), TLS handshake, Finished (20):
TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
TLSv1.3 (OUT), TLS handshake, Finished (20):
SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
ALPN, server did not agree to a protocol
Server certificate:
subject: CN=backblazeb2.com
start date: Jul 12 21:18:01 2022 GMT
expire date: Oct 10 21:18:00 2022 GMT
subjectAltName: host “f002.backblazeb2.com” matched cert’s “*.backblazeb2.com”
issuer: C=US; O=Let’s Encrypt; CN=R3
SSL certificate verify ok.

GET / HTTP/1.1
Host: f002.backblazeb2.com
User-Agent: curl/7.74.0
Accept: /

TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
Mark bundle as not supporting multiuse
< HTTP/1.1 301
< Location: http://www.backblaze.com/b2/cloud-storage.html?bznetid=1556129551661470760739
< Content-Length: 0
< Date: Thu, 25 Aug 2022 23:39:20 GMT
<
Connection #0 to host f002.backblazeb2.com left intact

host f002.backblazeb2.com
f002.backblazeb2.com has address 206.190.215.16

freelsjd · August 26, 2022, 1:05am

OK MichaelEischer, may be on to something now. Upgraded to 0.14. Tried compiling, installing and self-updating, several ways. Same outcome.

Created the logfile.log as you suggested. Error messages, same are repeated over and over:

2022/08/25 20:59:18 debug/round_tripper.go:101 debug.loggingRoundTripper.RoundTrip 1 RoundTrip() returned error: tls: first record does not $
2022/08/25 20:59:19 debug/round_tripper.go:94 debug.loggingRoundTripper.RoundTrip 1 ------------ HTTP REQUEST -----------
HEAD /file/fea-name-backup/restic_backups/original/config HTTP/1.1
Host: f004.backblazeb2.com
User-Agent: blazer/0.5.3
Authorization: redacted
X-Blazer-Method: b2_download_file_by_name
X-Blazer-Request-Id: 4

So, I noticed the backblaze server is different from the one you had asked me to perform the curl query (f004 instead of f002). So, I repeated the prompt with f004 instead of f002 as
curl -v https://f004.backblazeb2.com

Trying 18.204.152.241:443…
Connected to f004.backblazeb2.com (18.204.152.241) port 443 (#0)
ALPN, offering h2
ALPN, offering http/1.1
successfully set certificate verify locations:
CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
TLSv1.3 (OUT), TLS handshake, Client hello (1):
error:1408F10B:SSL routines:ssl3_get_record:wrong version number
Closing connection 0
curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number

This seems to say I am connecting to the wrong server ?

Can I force restic to use f002 instead of f004 ?

MichaelEischer · August 26, 2022, 6:38am

f002 was just a guess as it shows up rather frequently. But I think restic is actually connecting to the wrong server, but in a different way than you’d expect. Here’s what I get when resolving the domain:

$ host 18.204.152.241
241.152.204.18.in-addr.arpa domain name pointer ec2-18-204-152-241.compute-1.amazonaws.com.
$ host f004.backblazeb2.com
f004.backblazeb2.com has address 149.137.128.16
$ host 149.137.128.16      
16.128.137.149.in-addr.arpa domain name pointer f004.backblazeb2.com.

So apparently you’re DNS is redirecting you to the wrong IP. This is usually caused by some DNS filtering, either in a DNS proxy you’ve set up or at your ISP.

freelsjd · August 26, 2022, 2:00pm

Wow ! That is interesting. My DNS is setup to access Google DNS as shown here:

root@fea-home:/etc# systemd-resolve --status
Global
Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: foreign
Current DNS Server: 8.8.8.8
DNS Servers: 8.8.8.8 8.8.4.4
DNS Domain: lan

Now, I go to see what the DNS retreives for the the backblaze server giving me the fits:
resolvectl query f004.backblazeb2.com
f004.backblazeb2.com: 18.204.152.241 – link: enp4s0

– Information acquired via protocol DNS in 36.7ms.
– Data is authenticated: no

which we know is the wrong IP address.

It will be later today, but I can change my DNS, and see if this fixes things. Report wrong address by Google to whom ?

MichaelEischer · August 26, 2022, 5:01pm

Hmm, what’s even stranger is that dig @8.8.8.8 f004.backblazeb2.com returns the correct address for me (but I’m very likely talking to a completely different google server). Do you have a different host in the same network to check whether it also receives the same address?

Loxley · August 27, 2022, 12:29pm

It looks like you are not the only one with this problem and this specific ip:

freelsjd · August 27, 2022, 4:16pm

I cleared the DNS cache, and also tried different DNS servers. I still get the following inconsistency no matter what I try:

root@fea-home:/etc# host s3.us-west-004.backblazeb2.com
s3.us-west-004.backblazeb2.com has address 149.137.129.254
root@fea-home:/etc# host s3.us-west-002.backblazeb2.com
s3.us-west-002.backblazeb2.com has address 206.190.215.254
root@fea-home:/etc# host f002.backblazeb2.com
f002.backblazeb2.com has address 206.190.215.16
root@fea-home:/etc# host f004.backblazeb2.com
f004.backblazeb2.com has address 18.204.152.241

The longer backblaze server name is what is shown on my bucket list for my account. There seems to be an inconsistency somewhere. Not sure what to do at this point, but this sure seems like a place to start debugging.

rawtaz · August 27, 2022, 5:05pm

Did you ever try what @MichaelEischer suggested - trying other hosts on the same network? If yes, what was the result? If no, can you please try that?

There’s clearly something interfering with your DNS, this is not something restic does. Just to clear that up.

On the affected host, you should try to use host f004.backblazeb2.com 8.8.8.8 and/or dig @8.8.8.8 f004.backblazeb2.com (or with some other DNS server than 8.8.8.8, e.g. 1.1.1.1) to see what DNS replies you get when you have the host specifically asking to use a different DNS server.

In short, it’s a matter of isolating this down to what component or part of your network and systems it is that’s doing the “hijacking”.

freelsjd · August 28, 2022, 12:12am

Yes. I did try using host and dig commands with several different DNS, and got the same incorrect result. I also have access to another Debian/11 machine entirely away from my home, and got the correct results. Then I tried on a chromebook I have here at my home, entered linux debian/11 and also tested the DNS there, and got the correct result. So, I know it is not something my ISP or router to the house is doing. It must be confined only to my main linux box. I have probably installed something that is causing this. Perhaps go look at all the packages installed and updated since the problem did not exist.

freelsjd · August 28, 2022, 12:18am

Hah ! I think I discovered the potential problem ! @MichaelEischer clued me to the problem in his referenced article to phsource on serverfault.com. He apparently also uses plume in his house. I also use plume, and it is not just a coincidence that both of us have almost exactly the same issue and the same server IP address is being diverted to !

freelsjd · August 28, 2022, 12:38am

Yep. That’s it ! Plume has a simple way to turn it on/off, and restic working normal again. Sorry for all the issues guys, but thanks so much for all your help. I learned a lot in this exercise. Restic is great software, and very talented people on here are using it !

rawtaz · August 28, 2022, 12:41am

But how is it that your Chromebook, being on the same network, did not have the DNS problems, considering that on the computer that did have the problems you experienced them even when you told it to use a different DNS server. If the Plume did not intercept and modify DNS requests/responses, you should get the right answers on that computer, and if the Plume did intercept and modify, then the Chromebook should have been affected too.

freelsjd · August 28, 2022, 12:51am

That is an excellent question. My chromebook is on the wifi, whereas my linux box is branching off the plume pod through an ethernet port. Also, I noticed that the chromebook, when it created the VM to start linux, was using an entirely different DNS that wasn’t any of the one’s tried.

I am going to contact the Plume tech support people, who have been excellent in diagnosing problems I had early on understanding how to setup the router configuration of the main plume pod. It mat take a while to fully diagnose what is going on here, but at least I can catch up on the backups. I certainly plan to re-enable my plume firewall.