Restic stuck on restore (B2)

Hi there,

I just wanted to start this issue in the forums, as I cannot reproduce the issue every time. A Restic restore command seems to be stuck and does not load any new files. The data are a lot of files (small and big), around 1TB+, where it has only restored 20MB of (realistic) data.
Looking at networking tracing, it seems to infinitely tries to connect to Backblaze backend. It has been doing this for over 12 hours now.

This same behaviour happenend on other restores of that same day, one of them being a database. Before shutting these processes down, I thought it might be interesting to see what is going on with these stale processes.

My setup
Environment: Docker
Restic: 0.16.4
Backend: B2 (using Backblaze connection, not S3’s compatible interface)

Process:

Load average: 0.88 1.02 0.99 4/593 57
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
   30    26 root     S    2857m  18%   3   0% restic restore --tag <my-tag> latest --target /tmp/restore_<uuid>

Strace (believe to be no confidential data in here, let me know if there is):

sudo strace -p 1283735 -f -e trace=network -s 10000
strace: Process 1283735 attached with 11 threads
[pid 1283742] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=30, si_uid=0} ---
[pid 1283742] socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP <unfinished ...>
[pid 1283918] socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 7
[pid 1283742] <... socket resumed>)     = 3
[pid 1283918] setsockopt(7, SOL_SOCKET, SO_BROADCAST, [1], 4 <unfinished ...>
[pid 1283742] setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4 <unfinished ...>
[pid 1283918] <... setsockopt resumed>) = 0
[pid 1283918] connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, 16 <unfinished ...>
[pid 1283742] <... setsockopt resumed>) = 0
[pid 1283742] connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, 16) = 0
[pid 1283918] <... connect resumed>)    = 0
[pid 1283742] getsockname(3, {sa_family=AF_INET, sin_port=htons(48227), sin_addr=inet_addr("127.0.0.1")}, [112->16]) = 0
[pid 1283742] getpeername(3,  <unfinished ...>
[pid 1283918] getsockname(7,  <unfinished ...>
[pid 1283742] <... getpeername resumed>{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, [112->16]) = 0
[pid 1283918] <... getsockname resumed>{sa_family=AF_INET, sin_port=htons(35282), sin_addr=inet_addr("127.0.0.1")}, [112->16]) = 0
[pid 1283918] getpeername(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, [112->16]) = 0
[pid 1283742] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
[pid 1283742] connect(3, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("45.11.36.16")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 1283918] getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 1283918] getpeername(3, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("45.11.36.16")}, [112->16]) = 0
[pid 1283918] getsockname(3, {sa_family=AF_INET, sin_port=htons(35476), sin_addr=inet_addr("192.168.0.4")}, [112->16]) = 0
[pid 1283918] setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 1283918] setsockopt(3, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
[pid 1283918] setsockopt(3, SOL_TCP, TCP_KEEPINTVL, [30], 4) = 0
[pid 1283918] setsockopt(3, SOL_TCP, TCP_KEEPIDLE, [30], 4) = 0
[pid 1283744] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=30, si_uid=0} ---
[pid 1283738] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=30, si_uid=0} ---
[pid 1283738] socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP <unfinished ...>
[pid 1283744] socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP <unfinished ...>
[pid 1283738] <... socket resumed>)     = 7
[pid 1283744] <... socket resumed>)     = 8
[pid 1283738] setsockopt(7, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
[pid 1283744] setsockopt(8, SOL_SOCKET, SO_BROADCAST, [1], 4 <unfinished ...>
[pid 1283738] connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, 16 <unfinished ...>
[pid 1283744] <... setsockopt resumed>) = 0
[pid 1283738] <... connect resumed>)    = 0
[pid 1283744] connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, 16) = 0
[pid 1283738] getsockname(7, {sa_family=AF_INET, sin_port=htons(44144), sin_addr=inet_addr("127.0.0.1")}, [112->16]) = 0
[pid 1283744] getsockname(8,  <unfinished ...>
[pid 1283738] getpeername(7,  <unfinished ...>
[pid 1283744] <... getsockname resumed>{sa_family=AF_INET, sin_port=htons(57798), sin_addr=inet_addr("127.0.0.1")}, [112->16]) = 0
[pid 1283744] getpeername(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, [112->16]) = 0
[pid 1283738] <... getpeername resumed>{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, [112->16]) = 0
[pid 1283738] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=30, si_uid=0} ---
[pid 1283744] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 7
[pid 1283744] connect(7, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("45.11.36.16")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 1283742] getsockopt(7, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 1283742] getpeername(7, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("45.11.36.16")}, [112->16]) = 0
[pid 1283742] getsockname(7, {sa_family=AF_INET, sin_port=htons(35480), sin_addr=inet_addr("192.168.0.4")}, [112->16]) = 0
[pid 1283742] setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 1283742] setsockopt(7, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
[pid 1283742] setsockopt(7, SOL_TCP, TCP_KEEPINTVL, [30], 4) = 0
[pid 1283742] setsockopt(7, SOL_TCP, TCP_KEEPIDLE, [30], 4) = 0
[pid 1283918] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=30, si_uid=0} ---
[pid 1283918] socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 8
[pid 1283918] setsockopt(8, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
[pid 1283918] connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, 16) = 0
[pid 1283918] getsockname(8, {sa_family=AF_INET, sin_port=htons(42336), sin_addr=inet_addr("127.0.0.1")}, [112->16]) = 0
[pid 1283918] getpeername(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, [112->16]) = 0
[pid 1283742] socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 9
[pid 1283742] setsockopt(9, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
[pid 1283742] connect(9, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, 16) = 0
[pid 1283742] getsockname(9, {sa_family=AF_INET, sin_port=htons(48456), sin_addr=inet_addr("127.0.0.1")}, [112->16]) = 0
[pid 1283742] getpeername(9, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, [112->16]) = 0
[pid 1283918] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 8
[pid 1283918] connect(8, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("45.11.36.16")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 1283742] getsockopt(8, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 1283742] getpeername(8, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("45.11.36.16")}, [112->16]) = 0
[pid 1283742] getsockname(8, {sa_family=AF_INET, sin_port=htons(35482), sin_addr=inet_addr("192.168.0.4")}, [112->16]) = 0
[pid 1283742] setsockopt(8, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 1283742] setsockopt(8, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
[pid 1283742] setsockopt(8, SOL_TCP, TCP_KEEPINTVL, [30], 4) = 0
[pid 1283742] setsockopt(8, SOL_TCP, TCP_KEEPIDLE, [30], 4) = 0
[pid 1283738] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=30, si_uid=0} ---
[pid 1283738] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=30, si_uid=0} ---
[pid 1283918] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=30, si_uid=0} ---
[pid 1283742] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=30, si_uid=0} ---
[pid 1283742] socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
[pid 1283742] setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4 <unfinished ...>
[pid 1283918] socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP <unfinished ...>
[pid 1283742] <... setsockopt resumed>) = 0
[pid 1283918] <... socket resumed>)     = 8
[pid 1283742] connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, 16 <unfinished ...>
[pid 1283918] setsockopt(8, SOL_SOCKET, SO_BROADCAST, [1], 4 <unfinished ...>
[pid 1283742] <... connect resumed>)    = 0
[pid 1283918] <... setsockopt resumed>) = 0
[pid 1283742] getsockname(3,  <unfinished ...>
[pid 1283918] connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, 16 <unfinished ...>
[pid 1283742] <... getsockname resumed>{sa_family=AF_INET, sin_port=htons(58789), sin_addr=inet_addr("127.0.0.1")}, [112->16]) = 0
[pid 1283742] getpeername(3,  <unfinished ...>
[pid 1283918] <... connect resumed>)    = 0
[pid 1283742] <... getpeername resumed>{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, [112->16]) = 0
[pid 1283918] getsockname(8, {sa_family=AF_INET, sin_port=htons(41904), sin_addr=inet_addr("127.0.0.1")}, [112->16]) = 0
[pid 1283918] getpeername(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.11")}, [112->16]) = 0
[pid 1283738] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
[pid 1283738] connect(3, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("45.11.36.16")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 1283918] getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 1283918] getpeername(3, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("45.11.36.16")}, [112->16]) = 0
[pid 1283918] getsockname(3, {sa_family=AF_INET, sin_port=htons(35558), sin_addr=inet_addr("192.168.0.4")}, [112->16]) = 0
[pid 1283918] setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 1283918] setsockopt(3, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
[pid 1283918] setsockopt(3, SOL_TCP, TCP_KEEPINTVL, [30], 4) = 0
[pid 1283918] setsockopt(3, SOL_TCP, TCP_KEEPIDLE, [30], 4) = 0
...

Have you considered using their S3 interface instead? It’s recommended for Backblaze.

1 Like

Ye, it is in the pipeline, but I wanted to confirm if this was the issue. I never encountered this on older Restic versions, but hope to migrate to S3 interface soon.

Update: I see MichaelEischer exactly describes this “hanging” behaviour with the direct B2 interface (doc: recommend usage of B2's S3 API by MichaelEischer · Pull Request #3886 · restic/restic · GitHub):

The error handling in the library we use to access B2 has been a constant source of problems. Usually it result in hard to debug cases of restic appearing to just “hang”.

Will up the priority to switch to the S3 interface :slight_smile: Thanks for the response!

1 Like

Good news =)

You can generate a debug log by setting the environment variable DEBUG_LOG=debug.log. Then restic logs the http requests. Those should tell what’s going on.

2 Likes