Restic v0.16.4 freezing?

Hmm since updating to restic 0.16.4 compiled with go1.21.6 on darwin/arm64 I’m having random freezes on long operations. Trying to copy down a ~500GB snapshot from SharePoint to a local SSD. It’s trucking along at about 70MB/s. However it’ll just randomly… stop. Seems completely unresponsive. Nothing being transferred according to Activity Monitor (MacOS). So then I try to SIGINT and… it won’t. Just sits there. I have to SIGTERM before I can try again. Haven’t seen that happen. It never throws an error, I’ve let it sit for 2 hours and it just does… nothing.

Anyway, just thought I’d see if anyone else was experiencing this. Thanks!

Please send SIGQUIT to restic. Then restic will exit but also print a stackdump that should tell where it got stuck.

1 Like

See here!

https://hastebin.skyra.pw/isesahasus.swift


And another one:

https://hastebin.skyra.pw/utaxavayus.swift


And another…

https://hastebin.skyra.pw/fugeluwamu.swift


This one happened before it could even load all the index files. The frequency seems to growing, the closer I get to finally getting this snapshot out of the cloud. I don’t know if this is a timeout that Restic isn’t handling very gracefully, or if the app itself is actually freezing…

https://hastebin.skyra.pw/esudawogep.swift


Well… at this point, I can’t get it to copy anything more. Just sits there after loading the index. Guess I’ll let it sit a few hours and see if it crashes on its own maybe…

https://hastebin.skyra.pw/imefesezay.bash


Last update: Yeah, I’m too impatient for that haha. I used 0.16.2, and was able to successfully finish copying the snapshot. Seems like there’s a bug in 0.16.4, unfortunately!

Possibly pertinent information about the snapshot I was trying to copy:

Stats in restore-size mode:
     Snapshots processed:  1
        Total File Count:  8800
              Total Size:  552.300 GiB

Stats in raw-data mode:
     Snapshots processed:  1
        Total Blob Count:  377144
 Total Uncompressed Size:  550.418 GiB
              Total Size:  408.900 GiB
    Compression Progress:  100.00%
       Compression Ratio:  1.35x
Compression Space Saving:  25.71%
1 Like

Judging from the stacktraces, the connection to rclone is somehow stuck somewhere in the http2 code. But I didn’t notice any smoking guns. A quick glance over the http2 changes in golang.org/x/net between restic 0.16.2 and 0.16.4 also didn’t turn up anything paricularly suspicious.

Which rclone version are you using?

Could you try use restic 0.16.4 but bump golang.org/x/net in restic/go.mod at master · restic/restic · GitHub to v0.23.0, followed by running go mod tidy and go build ./cmd/restic? Does that work any better?

rclone v1.66.0
- os/version: darwin 14.3.1 (64 bit)
- os/kernel: 23.3.0 (arm64)
- os/type: darwin
- os/arch: arm64 (ARMv8 compatible)
- go/version: go1.22.1
- go/linking: dynamic
- go/tags: cmount

When you say bump golang.org/x/net to v0.23.0 - what version would it have been? When I cloned the project, and looked inside of go.mod, it already said v0.23.0? I’m assuming I misunderstood or am doing something wrong. I haven’t really programmed anything since the VS.NET 2003 days in college haha. I keep meaning to learn Go, cause it seems fun, and I could help Restic more, but haven’t gotten around to it just yet!

1 Like

@akrabu my idea was to check out the code for restic 0.16.4, GitHub - restic/restic at v0.16.4 . That version still uses golang.org/x/net 0.19.0.

1 Like

Ah. I screwed up and downloaded the very latest codebase, nevermind lol

I got the right version, ran go mod tidy, go build ./cmd/restic, then make.

I was running restic 0.16.4 compiled with go1.21.6 on darwin/arm64.
And now it is restic 0.16.4 compiled with go1.22.2 on darwin/arm64.

Not sure why setting golang.org/x/net to v0.23.0 makes restic report as compiled with go1.22.2 when I do restic version but I’m going to assume since it’s different that I did it right and this makes sense to you haha

I’ll download the exact same snapshot as before and report back!

So it’s sitting at [1:08:31] 23.27% 826 / 3549 packs copied and hasn’t changed for a good 15 minutes, other than the time counting up. No network activity. Doing the same thing as before. I’ll let it sit longer and see if it does anything, and if not I’ll do another SIGQUIT before I leave work today.


Now it’s at [1:38:01] 23.30% 827 / 3549 packs copied. It’s doing something, but incredibly slowly. Going to SIGQUIT, post that here, then start again and see how fast it goes for comparison.

https://hastebin.skyra.pw/ahujivixuj.swift


Starting it back up, it’s transferring at 60MB/s. Earlier it copied 826 packs and froze, only copying a single pack in 30 minutes. Now in just 2 minutes I’m at [2:01] 1.10% 30 / 2725 packs copied

Definitely something still going on. I’ll try to keep an eye on it and note when it stops, but my ADHD promises nothing lol


It froze at roughly [8:44] 3.93% 107 / 2725 packs copied. Going to SIGQUIT, try again, and see if it doesn’t freeze sooner like it did last time.

https://hastebin.skyra.pw/dukoqeyage.bash


It did not freeze sooner, but I decided to restart the test then record a video while I was running around at work.

It’s about 3.39GB, and a couple hours long - but you can see what it does. It freezes up for a long time, then starts again, then stops. If I downgrade to v0.16.2, it’s a constant stream and then it finishes as expected - no long pauses without an apparent reason.

https://dropover.cloud/8ef8ed#42cd92a4-5a5c-4370-9192-ad258be7d011

I noted it stopping at 00:33:06, starting at 01:05:34, stopping at 02:01:22, starting once more at 02:27:21 (and then I stopped it).

If you want to be able to download the video for quicker scrubbing, there’s three dots at the top right that will offer it. The video will be up for 3 months.

1 Like

Thanks for testing and the additional information. Unfortunately, I still don’t have the slightest idea what’s going on.

To recap: restic 0.16.2 seems to work. The official binaries use go.1.21.3. restic 0.16.4 occasionally gets stuck. It uses go 1.21.6. golang.org/x/net is upgraded from 0.17.0 to 0.19.0.

I’ve looked at the full diff between both restic versions (including relevant dependencies) as well as the Go compiler changes between them. And I didn’t notice anything particularly suspicious. There are next to no changes to the http2 code and nothing obviously problematic at the restic side.

To bisect this a bit further, please try the following:

What is the version string from the restic 0.16.2 version that you’re using? Please check out the restic 0.16.2 source code but compile it again with the go compiler you’ve used for the custom 0.16.4 binary.
That binary should then report restic 0.16.2 compiled with go1.22.2 on darwin/arm64 as version string.

Is the 0.16.2+go1.22.2 version affected by those intermittent hangs? I think you’ve already mentioned it, but are you absolutely sure that 0.16.2+go1.21.3 is not affected at all by these hangs? (I’m nearly out of ideas and want to make sure, I’m looking at the right thing.)

[Edit]You set the environment variable GODEBUG=http2debug=1 to get more insights on what is happening in the http2 protocol.[/Edit]