Restic prune runtime: cannot allocate memory

Lenski · March 5, 2021, 8:09pm

First off, v12 is fantastic and the --max-repack-size has proved exactly what was needed for a pair of instances.

The one I’m currently trying to remediate was the instance where I had attempted to resolve the problem of the restic cache filling up root. I’ve made it worse as it’s sister instance is purging functional as expected with --max-repack-size limited to 5g. Trying to do the same on this one however results in a “cannot allocate memory” error after getting to 6/21 snapshots.

I’m not sure where to proceed from here, I know this thread from 2018 had similar issues, but I only see out of memory when trying to run a check which also faults around snapshot 6 or 7.
I’d like to avoid restarting from nothing if possible.

OS: Oracle Linux 6.10
Free space on root: 20g (cache removed)
free memory: 8g (average)

Lenski · March 5, 2021, 8:11pm

command run and full stack trace posted separately if this proves useless then it won’t crowd the topic.

# /usr/local/bin/restic -r $REPO prune --max-repack-size 5g --json > $jsonfile
fatal error: runtime: cannot allocate memory

runtime stack:
runtime.throw(0xfaba8b, 0x1f)
/usr/local/go/src/runtime/panic.go:1116 +0x72
runtime.persistentalloc1(0x4000, 0x0, 0x169ccf8, 0x44f21c)
/usr/local/go/src/runtime/malloc.go:1376 +0x2e5
runtime.persistentalloc.func1()
/usr/local/go/src/runtime/malloc.go:1330 +0x45
runtime.persistentalloc(0x4000, 0x0, 0x169ccf8, 0x70100)
/usr/local/go/src/runtime/malloc.go:1329 +0x85
runtime.(*fixalloc).alloc(0x1699818, 0x7f1a4b8523a0)
/usr/local/go/src/runtime/mfixalloc.go:80 +0xf7
runtime.(*mheap).allocMSpanLocked(0x1680d60, 0x1)
/usr/local/go/src/runtime/mheap.go:1061 +0x6d
runtime.(*mheap).allocSpan(0x1680d60, 0x1, 0x7fffa0da1800, 0x169ccc8, 0x7f1a4b88d850)
/usr/local/go/src/runtime/mheap.go:1173 +0x645
runtime.(*mheap).alloc.func1()
/usr/local/go/src/runtime/mheap.go:907 +0x65
runtime.systemstack(0x469b54)
/usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1116

goroutine 145 [running]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc0013ab1c0 sp=0xc0013ab1b8 pc=0x469c80
runtime.(*mheap).alloc(0x1680d60, 0x1, 0xc0004d0118, 0xc000353300)
/usr/local/go/src/runtime/mheap.go:901 +0x85 fp=0xc0013ab210 sp=0xc0013ab1c0 pc=0x427885
runtime.(*mcentral).grow(0x1692a18, 0x0)
/usr/local/go/src/runtime/mcentral.go:506 +0x7a fp=0xc0013ab258 sp=0xc0013ab210 pc=0x418a7a
runtime.(*mcentral).cacheSpan(0x1692a18, 0x7f1a4b88dc08)
/usr/local/go/src/runtime/mcentral.go:177 +0x3e5 fp=0xc0013ab2d0 sp=0xc0013ab258 pc=0x418805
runtime.(*mcache).refill(0x7f1a89b2e7d0, 0x18)
/usr/local/go/src/runtime/mcache.go:142 +0xa5 fp=0xc0013ab2f0 sp=0xc0013ab2d0 pc=0x4181a5
runtime.(*mcache).nextFree(0x7f1a89b2e7d0, 0xc0013ab318, 0x4d5aac, 0xf82e40, 0x7f1a00218ac0)
/usr/local/go/src/runtime/malloc.go:880 +0x8d fp=0xc0013ab328 sp=0xc0013ab2f0 pc=0x40d22d
runtime.mallocgc(0xb0, 0xf2e220, 0x1, 0x6)
/usr/local/go/src/runtime/malloc.go:1061 +0x834 fp=0xc0013ab3c8 sp=0xc0013ab328 pc=0x40dc14
runtime.newobject(0xf2e220, 0x7f1a616ff42c)
/usr/local/go/src/runtime/malloc.go:1195 +0x38 fp=0xc0013ab3f8 sp=0xc0013ab3c8 pc=0x40e0b8
encoding/json.Unmarshal(0xc10816af9a, 0x198, 0xcf4f09c, 0xdbc500, 0xc175948b40, 0x275b8016, 0x3)
/usr/local/go/src/encoding/json/decode.go:100 +0x31 fp=0xc0013ab440 sp=0xc0013ab3f8 pc=0x74adf1
github.com/restic/restic/internal/restic.(*Node).UnmarshalJSON(0xc175948b40, 0xc10816af9a, 0x198, 0xcf4f09c, 0x7f1a616ff400, 0xc175948b40)
/restic/internal/restic/node.go:358 +0x59 fp=0xc0013ab498 sp=0xc0013ab440 pc=0x79ca59
encoding/json.(*decodeState).object(0xc04cd46000, 0xf69ac0, 0xc16ec9c748, 0x196, 0xc04cd46028, 0x7b)
/usr/local/go/src/encoding/json/decode.go:609 +0x207c fp=0xc0013ab728 sp=0xc0013ab498 pc=0x74f51c
encoding/json.(*decodeState).value(0xc04cd46000, 0xf69ac0, 0xc16ec9c748, 0x196, 0xf69ac0, 0xc16ec9c748)
/usr/local/go/src/encoding/json/decode.go:370 +0x6d fp=0xc0013ab790 sp=0xc0013ab728 pc=0x74bdad
encoding/json.(*decodeState).array(0xc04cd46000, 0xdd0600, 0xc04cd00c00, 0x197, 0xc04cd46028, 0x5b)
/usr/local/go/src/encoding/json/decode.go:558 +0x1aa fp=0xc0013ab878 sp=0xc0013ab790 pc=0x74c98a
encoding/json.(*decodeState).value(0xc04cd46000, 0xdd0600, 0xc04cd00c00, 0x197, 0x1, 0xc0a5152d50)
/usr/local/go/src/encoding/json/decode.go:360 +0x105 fp=0xc0013ab8e0 sp=0xc0013ab878 pc=0x74be45
encoding/json.(*decodeState).object(0xc04cd46000, 0xedddc0, 0xc04cd00c00, 0x16, 0xc04cd46028, 0xeba73f175714107b)
/usr/local/go/src/encoding/json/decode.go:765 +0x12fe fp=0xc0013abb70 sp=0xc0013ab8e0 pc=0x74e79e
encoding/json.(*decodeState).value(0xc04cd46000, 0xedddc0, 0xc04cd00c00, 0x16, 0xc0013abc20, 0x75d0ab)
/usr/local/go/src/encoding/json/decode.go:370 +0x6d fp=0xc0013abbd8 sp=0xc0013abb70 pc=0x74bdad
encoding/json.(*decodeState).unmarshal(0xc04cd46000, 0xedddc0, 0xc04cd00c00, 0xc04cd46028, 0x0)
/usr/local/go/src/encoding/json/decode.go:180 +0x1ea fp=0xc0013abc60 sp=0xc0013abbd8 pc=0x74b52a
encoding/json.Unmarshal(0xc0edb02000, 0x275b8016, 0x275b8036, 0xedddc0, 0xc04cd00c00, 0xdae014d393a98fee, 0x569f9085f91f07ef)
/usr/local/go/src/encoding/json/decode.go:107 +0x112 fp=0xc0013abca8 sp=0xc0013abc60 pc=0x74aed2
github.com/restic/restic/internal/repository.(*Repository).LoadTree(0xc000226000, 0x10d44e0, 0xc000122440, 0x8165e415ffc6a28, 0xeed592e8dc4d68c7, 0xefdae014d393a98f, 0x3a569f9085f91f07, 0xc037329ce0, 0x0, 0x0)
/restic/internal/repository/repository.go:734 +0x11d fp=0xc0013abd58 sp=0xc0013abca8 pc=0x80ecfd
github.com/restic/restic/internal/restic.loadTreeWorker(0x10d44e0, 0xc000122440, 0x10bcca0, 0xc000226000, 0xc0002c40c0, 0xc0002c4120)
/restic/internal/restic/tree_stream.go:37 +0xeb fp=0xc0013abf20 sp=0xc0013abd58 pc=0x7a5dab
github.com/restic/restic/internal/restic.StreamTrees.func1(0x0, 0x0)
/restic/internal/restic/tree_stream.go:164 +0x8a fp=0xc0013abf78 sp=0xc0013abf20 pc=0x7a922a
The Go Programming Language(0xc000244900, 0xc000122480)
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:57 +0x59 fp=0xc0013abfd0 sp=0xc0013abf78 pc=0x789b19
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc0013abfd8 sp=0xc0013abfd0 pc=0x46b8c1
created by The Go Programming Language
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:54 +0x66

goroutine 1 [semacquire, 5 minutes]:
sync.runtime_Semacquire(0xc000244910)
/usr/local/go/src/runtime/sema.go:56 +0x45
sync.(*WaitGroup).Wait(0xc000244908)
/usr/local/go/src/sync/waitgroup.go:130 +0x65
The Go Programming Language(0xc000244900, 0xc00038e4e0, 0xc000244900)
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:40 +0x31
github.com/restic/restic/internal/restic.FindUsedBlobs(0x10d44e0, 0xc00018df80, 0x10bcca0, 0xc000226000, 0xc01427a400, 0x15, 0x20, 0xc000244870, 0xc0005a4190, 0xffffffffffffffff, …)
/restic/internal/restic/find.go:52 +0x1e7
main.getUsedBlobs(0x7fffa0da75a3, 0x55, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, …)
/restic/cmd/restic/cmd_prune.go:591 +0x29a
main.runPruneWithRepo(0x0, 0xf8e829, 0x2, 0xc0004a2410, 0x7fffa0da7611, 0x2, 0x140000000, 0x0, 0x7fffa0da75a3, 0x55, …)
/restic/cmd/restic/cmd_prune.go:168 +0xee
main.runPrune(0x0, 0xf8e829, 0x2, 0xc0004a2410, 0x7fffa0da7611, 0x2, 0x140000000, 0x0, 0x7fffa0da75a3, 0x55, …)
/restic/cmd/restic/cmd_prune.go:151 +0x210
main.glob…func19(0x165a9c0, 0xc000192ff0, 0x0, 0x5, 0x0, 0x0)
/restic/cmd/restic/cmd_prune.go:35 +0x98
github.com/spf13/cobra.(*Command).execute(0x165a9c0, 0xc000192fa0, 0x5, 0x5, 0x165a9c0, 0xc000192fa0)
/home/build/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:826 +0x47c
github.com/spf13/cobra.(*Command).ExecuteC(0x165bb40, 0xc000421dd0, 0x2, 0x8)
/home/build/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x30b
github.com/spf13/cobra.(*Command).Execute(...)
/home/build/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
main.main()
/restic/cmd/restic/main.go:98 +0x3f

goroutine 18 [chan receive, 6 minutes]:
github.com/restic/restic/internal/restic.init.0.func1.1()
/restic/internal/restic/lock.go:254 +0xb3
created by github.com/restic/restic/internal/restic.init.0.func1
/restic/internal/restic/lock.go:251 +0x35

goroutine 33 [syscall, 6 minutes]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:147 +0x9d
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:23 +0x25
created by os/signal.Notify.func1.1
/usr/local/go/src/os/signal/signal.go:150 +0x45

goroutine 6 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc00018e380)
/home/build/go/pkg/mod/go.opencensus.io@v0.22.4/stats/view/worker.go:276 +0x105
created by go.opencensus.io/stats/view.init.0
/home/build/go/pkg/mod/go.opencensus.io@v0.22.4/stats/view/worker.go:34 +0x68

goroutine 7 [chan receive, 6 minutes]:
main.CleanupHandler(0xc000075800)
/restic/cmd/restic/cleanup.go:59 +0x4c
created by main.init.0
/restic/cmd/restic/cleanup.go:21 +0x67

goroutine 14 [IO wait]:
internal/poll.runtime_pollWait(0x7f1a62c80f48, 0x72, 0x10bfb20)
/usr/local/go/src/runtime/netpoll.go:222 +0x55
internal/poll.(*pollDesc).wait(0xc00018f498, 0x72, 0x10bfb00, 0x15e3630, 0x0)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(…)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc00018f480, 0xc1bf280000, 0xccbf, 0xccbf, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5
net.(*netFD).Read(0xc00018f480, 0xc1bf280000, 0xccbf, 0xccbf, 0x203054, 0x62959b, 0xc0000ca160)
/usr/local/go/src/net/fd_posix.go:55 +0x4f
net.(*conn).Read(0xc0000bc028, 0xc1bf280000, 0xccbf, 0xccbf, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:182 +0x8e
crypto/tls.(*atLeastReader).Read(0xc1506a3c40, 0xc1bf280000, 0xccbf, 0xccbf, 0x58c, 0xccb2, 0xc000263668)
/usr/local/go/src/crypto/tls/conn.go:779 +0x62
bytes.(*Buffer).ReadFrom(0xc0000ca280, 0x10bbd00, 0xc1506a3c40, 0x40b465, 0xe5c940, 0xf5aee0)
/usr/local/go/src/bytes/buffer.go:204 +0xb1
crypto/tls.(*Conn).readFromUntil(0xc0000ca000, 0x10be180, 0xc0000bc028, 0x5, 0xc0000bc028, 0x57c)
/usr/local/go/src/crypto/tls/conn.go:801 +0xf3
crypto/tls.(*Conn).readRecordOrCCS(0xc0000ca000, 0x0, 0x0, 0x0)
/usr/local/go/src/crypto/tls/conn.go:608 +0x115
crypto/tls.(*Conn).readRecord(…)
/usr/local/go/src/crypto/tls/conn.go:576
crypto/tls.(*Conn).Read(0xc0000ca000, 0xc0000df000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/crypto/tls/conn.go:1252 +0x15f
net/http.(*persistConn).Read(0xc0004d6360, 0xc0000df000, 0x1000, 0x1000, 0xc000102120, 0xc000263c58, 0x405835)
/usr/local/go/src/net/http/transport.go:1887 +0x77
bufio.(*Reader).fill(0xc0004e92c0)
/usr/local/go/src/bufio/bufio.go:101 +0x105
bufio.(*Reader).Peek(0xc0004e92c0, 0x1, 0x0, 0x0, 0x1, 0x0, 0xc14f5946c0)
/usr/local/go/src/bufio/bufio.go:139 +0x4f
net/http.(*persistConn).readLoop(0xc0004d6360)
/usr/local/go/src/net/http/transport.go:2040 +0x1a8
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:1708 +0xcb7

goroutine 15 [select]:
net/http.(*persistConn).writeLoop(0xc0004d6360)
/usr/local/go/src/net/http/transport.go:2340 +0x11c
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:1709 +0xcdc

goroutine 146 [syscall]:
syscall.Syscall(0x0, 0x7, 0xc246404000, 0x284a7d76, 0xc0dafd34c8, 0x43fdbc, 0xc000120f00)
/usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.read(0x7, 0xc246404000, 0x284a7d76, 0x284a7d76, 0x8, 0x48763b, 0x8)
/usr/local/go/src/syscall/zsyscall_linux_amd64.go:686 +0x5a
syscall.Read(…)
/usr/local/go/src/syscall/syscall_unix.go:187
internal/poll.(*FD).Read.func1(0x7ffff800000, 0x2, 0xc04b68d988)
/usr/local/go/src/internal/poll/fd_unix.go:155 +0x4c
internal/poll.ignoringEINTR(0xc0dafd3570, 0x72, 0x4a2e01, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:567 +0x27
internal/poll.(*FD).Read(0xc04b68d980, 0xc246404000, 0x284a7d76, 0x284a7d76, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:155 +0x13e
os.(*File).read(…)
/usr/local/go/src/os/file_posix.go:31
os.(*File).Read(0xc1b40fcaa8, 0xc246404000, 0x284a7d76, 0x284a7d76, 0xc0dafd3758, 0x7f8b45, 0x10ca020)
/usr/local/go/src/os/file.go:116 +0x71
io.(*LimitedReader).Read(0xc139502540, 0xc246404000, 0x284a7d76, 0x284a7d76, 0x10be4a0, 0x10be480, 0x10be4a0)
/usr/local/go/src/io/io.go:455 +0x63
io.ReadAtLeast(0x7f1a62cc83c0, 0xc139502560, 0xc246404000, 0x284a7d76, 0x284a7d76, 0x284a7d76, 0xe5c940, 0xeca0c0, 0xc067c98700)
/usr/local/go/src/io/io.go:314 +0x87
io.ReadFull(…)
/usr/local/go/src/io/io.go:333
github.com/restic/restic/internal/restic.ReadAt.func1(0x7f1a62cc83c0, 0xc139502560, 0xc139502560, 0x7f1a62cc83c0)
/restic/internal/restic/readerat.go:33 +0x5e
github.com/restic/restic/internal/cache.(*Backend).loadFromCacheOrDelegate(0xc0002443f0, 0x10d44e0, 0xc000122440, 0xf8f5d4, 0x4, 0xc067c987c0, 0x40, 0x284a7d76, 0x2585d2, 0xc043c31cb0, …)
/restic/internal/cache/backend.go:143 +0xee
github.com/restic/restic/internal/cache.(*Backend).Load(0xc0002443f0, 0x10d44e0, 0xc000122440, 0xf8f5d4, 0x4, 0xc067c987c0, 0x40, 0x284a7d76, 0x2585d2, 0xc043c31cb0, …)
/restic/internal/cache/backend.go:184 +0x3da
github.com/restic/restic/internal/restic.ReadAt(0x10d44e0, 0xc000122440, 0x10def60, 0xc0002443f0, 0xf8f5d4, 0x4, 0xc067c987c0, 0x40, 0x2585d2, 0xc246404000, …)
/restic/internal/restic/readerat.go:32 +0x162
github.com/restic/restic/internal/repository.(*Repository).LoadBlob(0xc000226000, 0x10d44e0, 0xc000122440, 0x38854a5f0b54f202, 0x868a73ae37012d4c, 0xc6d97657fed60114, 0xba9cb00678921729, 0xc0002c4131, 0x0, 0x284a7d76, …)
/restic/internal/repository/repository.go:186 +0x3bb
github.com/restic/restic/internal/repository.(*Repository).LoadTree(0xc000226000, 0x10d44e0, 0xc000122440, 0x4c38854a5f0b54f2, 0x14868a73ae37012d, 0x29c6d97657fed601, 0x31ba9cb006789217, 0xc048d3a380, 0x0, 0x0)
/restic/internal/repository/repository.go:728 +0x8a
github.com/restic/restic/internal/restic.loadTreeWorker(0x10d44e0, 0xc000122440, 0x10bcca0, 0xc000226000, 0xc0002c40c0, 0xc0002c4120)
/restic/internal/restic/tree_stream.go:37 +0xeb
github.com/restic/restic/internal/restic.StreamTrees.func1(0x0, 0x0)
/restic/internal/restic/tree_stream.go:164 +0x8a
The Go Programming Language(0xc000244900, 0xc000122500)
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:57 +0x59
created by The Go Programming Language
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:54 +0x66

goroutine 54 [select, 1 minutes]:
main.refreshLocks(0x1668160, 0xc0000d2720)
/restic/cmd/restic/lock.go:70 +0x1c5
created by main.lockRepository
/restic/cmd/restic/lock.go:47 +0x275

goroutine 112 [select, 5 minutes]:
github.com/restic/restic/internal/ui/progress.(*Counter).run(0xc0005a4190)
/restic/internal/ui/progress/counter.go:117 +0x265
created by github.com/restic/restic/internal/ui/progress.New
/restic/internal/ui/progress/counter.go:47 +0x125

goroutine 147 [running]:
goroutine running on other thread; stack unavailable
created by The Go Programming Language
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:54 +0x66

goroutine 148 [running]:
goroutine running on other thread; stack unavailable
created by The Go Programming Language
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:54 +0x66

goroutine 149 [runnable]:
reflect.valueInterface(0xedddc0, 0xc172aae080, 0x16, 0x1, 0x0, 0xc172aae000)
/usr/local/go/src/reflect/value.go:1019 +0x1b3
reflect.Value.Interface(…)
/usr/local/go/src/reflect/value.go:1016
encoding/json.indirect(0xedddc0, 0xc172aae080, 0x16, 0x3f60324c4d01e200, 0xcece65dbc607da87, 0xa46c1c9f319a3e02, 0x6663306262663832, 0x3333353764653536, 0x3362666662356238, 0x3030376137396332, …)
/usr/local/go/src/encoding/json/decode.go:479 +0x28c
encoding/json.(*decodeState).object(0xc172ab0210, 0xedddc0, 0xc172aae080, 0x16, 0xc172ab0238, 0x613966326563357b)
/usr/local/go/src/encoding/json/decode.go:605 +0x65
encoding/json.(*decodeState).value(0xc172ab0210, 0xedddc0, 0xc172aae080, 0x16, 0xc049eddc20, 0x75d0ab)
/usr/local/go/src/encoding/json/decode.go:370 +0x6d
encoding/json.(*decodeState).unmarshal(0xc172ab0210, 0xedddc0, 0xc172aae080, 0xc172ab0238, 0x0)
/usr/local/go/src/encoding/json/decode.go:180 +0x1ea
encoding/json.Unmarshal(0xc26e8ac000, 0x3626e68, 0x3626e88, 0xedddc0, 0xc172aae080, 0x65cece87da07c64c, 0x1c6ca4023e9a31db)
/usr/local/go/src/encoding/json/decode.go:107 +0x112
github.com/restic/restic/internal/repository.(*Repository).LoadTree(0xc000226000, 0x10d44e0, 0xc000122440, 0x6f399702044ec258, 0x4c32603f26e2014d, 0xdb65cece87da07c6, 0x9f1c6ca4023e9a31, 0xc164dbd2c0, 0x0, 0x0)
/restic/internal/repository/repository.go:734 +0x11d
github.com/restic/restic/internal/restic.loadTreeWorker(0x10d44e0, 0xc000122440, 0x10bcca0, 0xc000226000, 0xc0002c40c0, 0xc0002c4120)
/restic/internal/restic/tree_stream.go:37 +0xeb
github.com/restic/restic/internal/restic.StreamTrees.func1(0x0, 0x0)
/restic/internal/restic/tree_stream.go:164 +0x8a
The Go Programming Language(0xc000244900, 0xc000122640)
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:57 +0x59
created by The Go Programming Language
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:54 +0x66

goroutine 150 [semacquire, 5 minutes]:
sync.runtime_Semacquire(0xc021e2c4b8)
/usr/local/go/src/runtime/sema.go:56 +0x45
sync.(*WaitGroup).Wait(0xc021e2c4b0)
/usr/local/go/src/sync/waitgroup.go:130 +0x65
github.com/restic/restic/internal/restic.StreamTrees.func2(0xf546c0, 0x40d750)
/restic/internal/restic/tree_stream.go:171 +0x33
The Go Programming Language(0xc000244900, 0xc00038e4c0)
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:57 +0x59
created by The Go Programming Language
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:54 +0x66

goroutine 151 [select]:
github.com/restic/restic/internal/restic.filterTrees(0x10d44e0, 0xc000122440, 0xc01427a400, 0x15, 0x20, 0xc0002c40c0, 0xc0002c4120, 0xc0002c4180, 0xc00038e4a0, 0xc0005a4190)
/restic/internal/restic/tree_stream.go:89 +0x55d
github.com/restic/restic/internal/restic.StreamTrees.func3(0x0, 0x0)
/restic/internal/restic/tree_stream.go:179 +0xc8
The Go Programming Language(0xc000244900, 0xc00397e3c0)
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:57 +0x59
created by The Go Programming Language
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:54 +0x66

goroutine 152 [chan receive]:
github.com/restic/restic/internal/restic.FindUsedBlobs.func2(0xc0004dc0f0, 0x0)
/restic/internal/restic/find.go:34 +0x226
The Go Programming Language(0xc000244900, 0xc00038e4e0)
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:57 +0x59
created by The Go Programming Language
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:54 +0x66

alexweiss · March 6, 2021, 6:18am

Seems you are running out of memory. prune still needs to keep the complete index in memory plus the list of used blobs. Your prune run aborted while collecting this list of used blobs.

Did you try to set the environment variable GOGC to let the go garbage collector run more agressively, e.g. GOGC=20 restic ....?

However, if playing around with the garbage collector options does not help, your only possibilities are to prune from a machine with more RAM or forget more snapshots before pruning…

alexweiss · March 6, 2021, 6:21am

Ah, there could be another issue you should rule out before you start forgetting snapshots: In edge cases, the index could be too large (containing non-existing entries) and therefore needs too much memory. To rule this out, you could try to run a rebuild-index before the pruning.

Lenski · March 7, 2021, 1:28pm

I have not tried altering the value for GOGC, to be honest I could not find much detail about it outside of the other ticket linked and had already attempted so much troubleshooting on this particular instance that I had concern about making the situation worse.

I will try to rebuild-index and up the value for GOGC to 20 and see what results.

Lenski · March 8, 2021, 10:05pm

Rebuild ran successfully, and setting the go garbage collection to 20 did the trick for prune!

I’m pretty green in this stuff still, if there’s a good doc on GOGC i’ll just take a link and explore myself as i know this could be a lot to ask…
What is the default value for GOGC and what’d be the maximum?
Is that stored in a file somewhere or only extracted?
20 is mentioned as the aggressive collection value and a general starting point for troubleshooting. I’m guessing I can let my scripting resume without it? Or is there another value I need to reference and update before letting prune run as it was about a month back.

# restic rebuild-index
repository c4eba27b opened successfully, password is correct
loading indexes...
getting pack files to read...
rebuilding index
[0:11] 100.00%  48859 / 48859 packs processed
deleting obsolete index files
[0:00] 100.00%  30 / 30 files deleted
done

Hurray!

# GOGC=20 /usr/local/bin/restic -r $REPO prune --max-repack-size 5g
repository c4eba27b opened successfully, password is correct
loading indexes...
loading all snapshots...
finding data that is still in use for 21 snapshots
[3:02] 19.05%  4 / 21 snapshots
[3:40] 23.81%  5 / 21 snapshots
[3:43] 28.57%  6 / 21 snapshots
[4:22] 33.33%  7 / 21 snapshots
[5:40] 38.10%  8 / 21 snapshots    #passed the problem hurtle!!!
[6:34] 42.86%  9 / 21 snapshots
[7:09] 47.62%  10 / 21 snapshots
[8:07] 52.38%  11 / 21 snapshots
[8:23] 57.14%  12 / 21 snapshots
[8:47] 61.90%  13 / 21 snapshots
[9:44] 66.67%  14 / 21 snapshots
[9:56] 71.43%  15 / 21 snapshots
[10:00] 76.19%  16 / 21 snapshots
[10:09] 80.95%  17 / 21 snapshots
[10:47] 85.71%  18 / 21 snapshots
[10:56] 90.48%  19 / 21 snapshots
[11:07] 95.24%  20 / 21 snapshots
[11:29] 100.00%  21 / 21 snapshots
[11:29] 100.00%  21 / 21 snapshots
searching used packs...
collecting packs for deletion and repacking
[0:19] 100.00%  48859 / 48859 packs processed

to repack:        41511 blobs / 1.676 GiB
this removes      11459 blobs / 1.562 GiB
to delete:       135923 blobs / 110.460 GiB
total prune:     147382 blobs / 112.023 GiB
remaining:      2472846 blobs / 190.128 GiB
unused size after prune: 9.504 GiB (5.00% of remaining size)

repacking packs
[1:09] 100.00%  97 / 97 packs repacked
rebuilding index
[0:13] 100.00%  33992 / 33992 packs processed
deleting obsolete index files
[0:00] 100.00%  53 / 53 files deleted
removing 14896 old packs
[0:39] 100.00%  14896 / 14896 files deleted
done

alexweiss · March 9, 2021, 4:02am

From the official go docu:

The GOGC variable sets the initial garbage collection target percentage. A collection is triggered when the ratio of freshly allocated data to live data remaining after the previous collection reaches this percentage. The default is GOGC=100. Setting GOGC=off disables the garbage collector entirely.

I also opened an issue in github - maybe in future, restic can handle this GC stuff without much need to tweak parameters by hand.

@Lenski Even though your problem was solved using a lower GOGC value, be warned: This still means that restic’s memory requirements run close to your available memory. You could run into trouble if you have more data saved in your repository with the same memory available…

764287 · March 9, 2021, 1:32pm

What backend are you using? 11 minutes to find data in 21 snapshots and 300GiB is a lot. For me it takes 20 seconds to scan 1TiB with 150 snapshots on local HDD.

alexweiss · March 9, 2021, 3:44pm

11 minutes is quite a lot, but this should not depend too much on the backend - at least if you are using a cache which contains all needed data. The main effort here is to traverse all trees for all snapshots. So this mainly depends on the directory structure of the data backup’ed as well as the access speed of the cached data and CPU power (mainly for the encryption)…

Lenski · March 10, 2021, 3:23pm

I’ll need to do a bit more testing if my script is able to execute correctly on the sister instance, i suspect there is still something off but that’d be another topic.

Alex I’m not sure I am following your previous statement, memory usage by Restic is still a grey area for me. How could I check this or try to correct it?

You could run into trouble if you have more data saved in your repository with the same memory available…

The backend is an S3 bucket in a separate network from the instance, latency plus the quantity of files since it’s database binaries and tables that are being backed up are contributing.
As for the cache, I have been considering that a separate issue. Current I cannot retain the full cache as it’s causing the root directory on both the primary and sister instance to be full. If I understand the usecase correctly --max-cache-size should help retain the cache to the maximum specified at the time of running prune?