Restic 0.7.3 causing kernel panic on MacOS Sierra (fully up to date)

Hi,

I tried a couple of backups to Minios (running on a Windows 2012 R2 server), and at some point the Mac kernel panics. This happened twice (I only have the most recent kernel panic details). This is on a Mac mini that has been ROCK SOLID until now (it’s the first time I’ve ever seen a kernel panic on a Mac).

Anonymous UUID:       72A2DB74-5BE6-0759-EAAF-962BC14E3A29

Sat Sep 23 20:43:12 2017

*** Panic Report ***
panic(cpu 3 caller 0xffffff8021ffe39d): Kernel trap at 0xffffff8021fa4431, type 14=page fault, registers:
CR0: 0x000000008001003b, CR2: 0xffffff7f868ce884, CR3: 0x00000003c9b1e033, CR4: 0x00000000001626e0
RAX: 0xffffff802ba4d438, RBX: 0xffffff7f80000000, RCX: 0xffffff7f868ce880, RDX: 0x000000008013e781
RSP: 0xffffff9220e33ab0, RBP: 0xffffff9220e33af0, RSI: 0x00000000008b5487, RDI: 0x0000000080000000
R8:  0xffffff8038fc6880, R9:  0x0000000000000000, R10: 0x00000000ffffff9f, R11: 0x00000000001f1ac4
R12: 0xffffff80509fe900, R13: 0xffffff802ced8738, R14: 0x0000003fffffffc0, R15: 0x0000000000000000
RFL: 0x0000000000010246, RIP: 0xffffff8021fa4431, CS:  0x0000000000000008, SS:  0x0000000000000010
Fault CR2: 0xffffff7f868ce884, Error code: 0x0000000000000002, Fault CPU: 0x3, PL: 1, VF: 5

Backtrace (CPU 3), Frame : Return Address
0xffffff9220e33740 : 0xffffff8021ee953c 
0xffffff9220e337c0 : 0xffffff8021ffe39d 
0xffffff9220e339a0 : 0xffffff8021e9a593 
0xffffff9220e339c0 : 0xffffff8021fa4431 
0xffffff9220e33af0 : 0xffffff8021faa4e5 
0xffffff9220e33b40 : 0xffffff8021f89d31 
0xffffff9220e33e40 : 0xffffff8021f80d2c 
0xffffff9220e33f30 : 0xffffff8022374685 
0xffffff9220e33f50 : 0xffffff80224240f5 
0xffffff9220e33fb0 : 0xffffff8021e9ad96 

BSD process name corresponding to current thread: restic_0.7.3_dar

Mac OS version:
16G29

Kernel version:
Darwin Kernel Version 16.7.0: Thu Jun 15 17:36:27 PDT 2017; root:xnu-3789.70.16~2/RELEASE_X86_64
Kernel UUID: D3314D98-5D40-3CD8-98A4-F1DD46C20E03
Kernel slide:     0x0000000021c00000
Kernel text base: 0xffffff8021e00000
__HIB  text base: 0xffffff8021d00000
System model name: Macmini6,1 (Mac-031AEE4D24BFF0B1)

Would you mind creating an issue on GitHub? This may be a bug, but I’m not sure where yet. With the issue we can escalate it…

Hm, is this machine suitable for high CPU load? Maybe sub-optimal cooling? restic is quite resource intensive…

An issue over at GitHub would be great.

FWIW I’ve been running restic at the add-cache branch (commit 60538650), which is beyond the 0.7.3 release, and it works fine. But I’m using the SFTP backend, in case it has anything to do with that.

Hi, I haven’t had a chance to sit down and collate all the info for creating the issue in GitHub.

As far as cooling is concerned, I never heard the fan kick in (it always runs, but very quietly), which I have heard previously (e.g. when setting up macOS initially, when it does a full index of everything).

I was testing against Minio, so I’ll do another test against an external USB3 disk to see if that is any different. I must admit I’m pretty frustrated with my Restic testing so far… (especially because when it works it is amazing).

I’ll create an issue in the Go repository, let’s see if they have any idea what’s going on here. No user-space program should be able to crash the kernel…

I’ve created a new issue in the Go issue tracker, this is not specific to restic (at least I think it isn’t): https://github.com/golang/go/issues/22016

@rotor can you please tell us which binary you used exactly? And what does restic version report?

@rotor can you reproduce the issue? If yes, then we can try to find out which syscall caused it. This may be a bug in Go after all…

I would tend to suspect bad RAM or similar hardware problem if it happens under load. Can you run a hardware test?

I had a Macbook that was solid for 2 years and then suddenly started having random panics. The built-in hardware test showed a problem with the disk. Apple replaced the logic board & SSD, and it’s been solid ever since.

Sorry for the delay:

./restic_0.7.3_darwin_amd64 version
restic 0.7.3
compiled with go1.9 on darwin/amd64

I will run one this evening. Does restic use an enormous amount of RAM? I’m not a super heavy computer user, but I routinely have 20-30 Safari tabs open, with 5-10 other apps running simultaneously (i.e. what I would consider fairly typical computer use).

I successfully ran a backup of the same source to an external USB 3 drive. So could there be a bug in the Minio implementation?

It happened twice, so it seems reproducible. It didn’t kernel panic when backing up to the external USB 3 drive, do you think there’s a big difference in memory utilisation (to justify the explanation of faulty memory) when going to Minio rather than a local disk?

There was a bug in the s3 backend using a lot of RAM, that was resolved but is not contained in a released version of restic yet (see #1267), that could be it. The description of the issue is here #1256.

I’ve built a binary for you from the latest master branch (so you’ll also get the local cache feature): https://fd0.me/tmp/restic_v0.7.3-66-g801dbb6d_darwin_amd64.bz2

Thanks! I’ll test it tonight.

Hi, sorry for the slow progress. I suspect @armhold might be right. Both the Apple Hardware Test and memtest86 cause the machine to freeze at some point (AHT froze at around 11 minutes, memtest86 after almost two hours), so I need to do further testing by removing each of the memory DIMMs and testing individually.

In 30 years of using computers and working in IT, it’s the first time I’ve seen a hardware problem (still just an assumption) that is this subtle. Further testing to be done, and a further update also. If I can narrow it down to a single DIMM, I will re-check restic 0.7.3 against Minio to see if I can make it fail again. (I will also test the binary @fd0 has given me, but I suspect that will be much quicker, therefore less likely to trigger the kernel panic).

Just to make things a little more complicated, in the meantime I’ve also upgraded to High Sierra. =)

1 Like

I’m sorry to say that I’m glad to hear that. It’s not the first time restic uncovered a hardware problem… I’ve closed the issue for the Go project over at GitHub.

Please keep us informed how this goes on!