I have a linux/debian/bullseye/11.4 system with restic 0.13.1 compiled with go1.18 on linux/amd64 installed. I am also using backblaze b2 to backup and store to the cloud about 1 TB of data from this system. I am setup to backup by filesystems at the rate of two per night for a total of 14 backups over the week. Each filesystem backup is stored as a separate folder underneath a backblaze bucket. It has been working flawlessly now for several weeks, and really works well; much better than trying to backup to dropbox or google-drive as I was doing earlier when I first started using restic.
Now, all of a sudden, in the middle of a backup last night (about 4am, 8/23/22), it stopped working. Now, I can verify that none of my backup scripts are working at all. I have narrowed it down to the restic command itself. It just hangs, and nothing happens. I have the verbose level at the max of 3, but still nothing comes out to the screen. It appears to login to backblaze, because if I change the backblaze key ID to something different, it gives me a message. I can also run a separate backblaze b2 script (b2 program compiled from source for linux provided by backblaze) it works with the same keys. Indeed, I also run a daily home directory sync using this b2 program with the sync option, and it is still working fine. So, I know it is something going on with restic alone, not connecting to backblaze at the moment.
I should also point out that I use similar scripts to backup to a local usb drive first. That is working fine and verifies with check fine afterwords. The restic-2-backblaze is just a copy of the usb local backup. So, restic locally is working fine to the usb drive as usual. This is uniquely an issue with restic to backblaze.
I have read that similar problems have occurred with ca certificates going out of date suddenly, and perhaps that is it. If so, maybe someone else has the same problem and a fix or workaround.
Otherwise, I am at a complete loss here at the moment. Any help apprreciated.
Well, unless you changed the restic binary, it will do the same thing it’s done all the time, so presumably something outside of restic changed and started causing you this problem.
That said, there’s been a few cases with “stalled” B2 connectivity lately, and we think that swapping the library used in restic for B2 connectivity might help, but it’s not something you do over night and it’s not certain there’s much ROI in doing so (as per the below).
There’s the possibility of using B2’s S3 API instead of their regular API, would you consider giving that a shot? Restic fully supports S3 so it should be fine to use that, assuming B2 do what they should be doing.
rawtaz, thanks for your response. I also got some feedback from backblaze tech support. They did recommend verifying certs are syncced. Unfortunately, that is one area where I am completely lacking. I just know they are ascii text files stored in a specific location. How can I verify and then update if necessary on the restic end ?
I have confirmed now that it must be something wrong with my installation. I have a son who uses the same version of debian and also backs up to backblaze using restic, and he has no issues right now. It must have been something I installed or configured since the last time it worked correctly that is causing this problem. I probably broke restic somehow.
To have correct certificates on debian it should be enough to have the package ca-certificates installed. We’ve also seems issue due to some DNS filter blocking some of the backblaze domains. What does curl -v https://f002.backblazeb2.com or host f002.backblazeb2.com return?
Other than that you could update to restic 0.14.0 and generate a debug log by setting the environment variable DEBUG_LOG=logfile.log then look for lines with debug.loggingRoundTripper.RoundTrip and probably RoundTrip() returned error.
OK MichaelEischer, may be on to something now. Upgraded to 0.14. Tried compiling, installing and self-updating, several ways. Same outcome.
Created the logfile.log as you suggested. Error messages, same are repeated over and over:
2022/08/25 20:59:18 debug/round_tripper.go:101 debug.loggingRoundTripper.RoundTrip 1 RoundTrip() returned error: tls: first record does not $
2022/08/25 20:59:19 debug/round_tripper.go:94 debug.loggingRoundTripper.RoundTrip 1 ------------ HTTP REQUEST -----------
HEAD /file/fea-name-backup/restic_backups/original/config HTTP/1.1
So, I noticed the backblaze server is different from the one you had asked me to perform the curl query (f004 instead of f002). So, I repeated the prompt with f004 instead of f002 as
curl -v https://f004.backblazeb2.com
f002 was just a guess as it shows up rather frequently. But I think restic is actually connecting to the wrong server, but in a different way than you’d expect. Here’s what I get when resolving the domain:
$ host 220.127.116.11
241.152.204.18.in-addr.arpa domain name pointer ec2-18-204-152-241.compute-1.amazonaws.com.
$ host f004.backblazeb2.com
f004.backblazeb2.com has address 18.104.22.168
$ host 22.214.171.124
126.96.36.199.in-addr.arpa domain name pointer f004.backblazeb2.com.
So apparently you’re DNS is redirecting you to the wrong IP. This is usually caused by some DNS filtering, either in a DNS proxy you’ve set up or at your ISP.
Wow ! That is interesting. My DNS is setup to access Google DNS as shown here:
root@fea-home:/etc# systemd-resolve --status
Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: foreign
Current DNS Server: 188.8.131.52
DNS Servers: 184.108.40.206 220.127.116.11
DNS Domain: lan
Hmm, what’s even stranger is that dig @18.104.22.168 f004.backblazeb2.com returns the correct address for me (but I’m very likely talking to a completely different google server). Do you have a different host in the same network to check whether it also receives the same address?
The longer backblaze server name is what is shown on my bucket list for my account. There seems to be an inconsistency somewhere. Not sure what to do at this point, but this sure seems like a place to start debugging.
Did you ever try what @MichaelEischer suggested - trying other hosts on the same network? If yes, what was the result? If no, can you please try that?
There’s clearly something interfering with your DNS, this is not something restic does. Just to clear that up.
On the affected host, you should try to use host f004.backblazeb2.com 22.214.171.124 and/or dig @126.96.36.199 f004.backblazeb2.com (or with some other DNS server than 188.8.131.52, e.g. 184.108.40.206) to see what DNS replies you get when you have the host specifically asking to use a different DNS server.
In short, it’s a matter of isolating this down to what component or part of your network and systems it is that’s doing the “hijacking”.
Yes. I did try using host and dig commands with several different DNS, and got the same incorrect result. I also have access to another Debian/11 machine entirely away from my home, and got the correct results. Then I tried on a chromebook I have here at my home, entered linux debian/11 and also tested the DNS there, and got the correct result. So, I know it is not something my ISP or router to the house is doing. It must be confined only to my main linux box. I have probably installed something that is causing this. Perhaps go look at all the packages installed and updated since the problem did not exist.
Hah ! I think I discovered the potential problem ! @MichaelEischer clued me to the problem in his referenced article to phsource on serverfault.com. He apparently also uses plume in his house. I also use plume, and it is not just a coincidence that both of us have almost exactly the same issue and the same server IP address is being diverted to !
Yep. That’s it ! Plume has a simple way to turn it on/off, and restic working normal again. Sorry for all the issues guys, but thanks so much for all your help. I learned a lot in this exercise. Restic is great software, and very talented people on here are using it !
But how is it that your Chromebook, being on the same network, did not have the DNS problems, considering that on the computer that did have the problems you experienced them even when you told it to use a different DNS server. If the Plume did not intercept and modify DNS requests/responses, you should get the right answers on that computer, and if the Plume did intercept and modify, then the Chromebook should have been affected too.
That is an excellent question. My chromebook is on the wifi, whereas my linux box is branching off the plume pod through an ethernet port. Also, I noticed that the chromebook, when it created the VM to start linux, was using an entirely different DNS that wasn’t any of the one’s tried.
I am going to contact the Plume tech support people, who have been excellent in diagnosing problems I had early on understanding how to setup the router configuration of the main plume pod. It mat take a while to fully diagnose what is going on here, but at least I can catch up on the backups. I certainly plan to re-enable my plume firewall.