Restic uses more than 4gb ram on 14gb backup than crashes

On a few centos systems i have the problem that i get an runtime out of memory error when trying to do a restic backup every night. one of these systems is very busy and sometimes gets killed. but the other system does almost nothing and has 200MB used of ram when sitting normally.
When i run a restic backup restic takes over 90% of memory from the machine (the machine has a total of 4gb ram) and ultimately crashes due to being out of memory. the whole system is just 14gb and a successfull first backup has already been executed.

I tried the following:
Upgraded from 2 to 4gb of ram (this solved the problem temporarily)
Upgraded restic from 0.9.6 to 12.1 (did nothing)

Try adding GOGC=20 as environment variable for rustic, it should make the memory cleanup in Go a bit more agressive and hence use less memory in total I think.

Can you please show the command and output of a backup run? Are there many small files in those 14 GB?

2 Likes

By adding GOGC=20 the backup did complete. The server is a webhosting server with multiple php version support per hosting account. so i checked 1 of the accounts it had 21200 files spread out over 210 mb and there are a lot of images soo the other files are likely to be small. this server has around 10-15 of these accounts with some accounts having more files and some having less files.

Backup command:
restic -r s3:http://localbucket backup /home /etc /var /chroot --exclude='/var/lib/mysql' --exclude='/chroot/home/cagefs-skeleton' --no-cache

Succesfull backup run output:

Files:         385 new, 318589 changed, 114474 unmodified
Dirs:           65 new, 70868 changed, 18975 unmodified
Added to the repo: 257.292 MiB
processed 433448 files, 5.226 GiB in 18:00

Unsuccessful output:

fatal error: runtime: out of memory

runtime stack:
runtime.throw(0xfb493c, 0x16)
	/usr/local/go/src/runtime/panic.go:1117 +0x72
runtime.sysMap(0xc160000000, 0x8000000, 0x16be8d0)
	/usr/local/go/src/runtime/mem_linux.go:169 +0xc6
runtime.(*mheap).sysAlloc(0x16a4b80, 0x8000000, 0x42c037, 0x16a4b88)
	/usr/local/go/src/runtime/malloc.go:729 +0x1e5
runtime.(*mheap).grow(0x16a4b80, 0x4000, 0x0)
	/usr/local/go/src/runtime/mheap.go:1346 +0x85
runtime.(*mheap).allocSpan(0x16a4b80, 0x4000, 0x440000, 0x7f71dc404db8)
	/usr/local/go/src/runtime/mheap.go:1173 +0x609
runtime.(*mheap).alloc.func1()
	/usr/local/go/src/runtime/mheap.go:910 +0x59
runtime.systemstack(0x46b114)
	/usr/local/go/src/runtime/asm_amd64.s:379 +0x66
runtime.mstart()
	/usr/local/go/src/runtime/proc.go:1246

goroutine 1 [running]:
runtime.systemstack_switch()
	/usr/local/go/src/runtime/asm_amd64.s:339 fp=0xc002c08c90 sp=0xc002c08c88 pc=0x46b240
runtime.(*mheap).alloc(0x16a4b80, 0x4000, 0x7f71dd010100, 0x0)
	/usr/local/go/src/runtime/mheap.go:904 +0x85 fp=0xc002c08ce0 sp=0xc002c08c90 pc=0x427ce5
runtime.(*mcache).allocLarge(0x7f72049b05b8, 0x8000000, 0x200001, 0x20300a)
	/usr/local/go/src/runtime/mcache.go:224 +0x97 fp=0xc002c08d38 sp=0xc002c08ce0 pc=0x418357
runtime.mallocgc(0x8000000, 0xdc79c0, 0x1, 0x0)
	/usr/local/go/src/runtime/malloc.go:1078 +0x925 fp=0xc002c08dc0 sp=0xc002c08d38 pc=0x40dee5
runtime.makeslice(0xdc79c0, 0x1000000, 0x1000000, 0x6fb3b215e550cc58)
	/usr/local/go/src/runtime/slice.go:98 +0x6c fp=0xc002c08df0 sp=0xc002c08dc0 pc=0x44e20c
github.com/restic/restic/internal/repository.(*indexMap).grow(0xc0000e16f8)
	/restic/internal/repository/indexmap.go:102 +0x5c fp=0xc002c08e70 sp=0xc002c08df0 pc=0x7eecdc
github.com/restic/restic/internal/repository.(*indexMap).add(0xc0000e16f8, 0x4d02f7132c27716b, 0xd98fd53df0e02274, 0xffd0622919c3ece5, 0x6fb3b215e550cc58, 0x188e5, 0x70700049940)
	/restic/internal/repository/indexmap.go:43 +0x165 fp=0xc002c08ed8 sp=0xc002c08e70 pc=0x7ee8a5
github.com/restic/restic/internal/repository.(*Index).merge.func2(0xc0b4fcf928, 0xa001)
	/restic/internal/repository/index.go:555 +0xb2 fp=0xc002c08f50 sp=0xc002c08ed8 pc=0x7fb7f2
github.com/restic/restic/internal/repository.(*indexMap).foreach(0xc095fc3838, 0xc002c09000)
	/restic/internal/repository/indexmap.go:62 +0x76 fp=0xc002c08f90 sp=0xc002c08f50 pc=0x7ee976
github.com/restic/restic/internal/repository.(*Index).merge(0xc0000e1680, 0xc095fc37c0, 0x0, 0x0)
	/restic/internal/repository/index.go:552 +0x238 fp=0xc002c09078 sp=0xc002c08f90 pc=0x7ed0f8
github.com/restic/restic/internal/repository.(*MasterIndex).MergeFinalIndexes(0xc0000a1480, 0x0, 0x0)
	/restic/internal/repository/master_index.go:288 +0x1df fp=0xc002c09120 sp=0xc002c09078 pc=0x7f277f
github.com/restic/restic/internal/repository.(*Repository).LoadIndex(0xc0001f00e0, 0x10f8128, 0xc0000a0ac0, 0x0, 0x0)
	/restic/internal/repository/repository.go:463 +0xcd fp=0xc002c09170 sp=0xc002c09120 pc=0x7f76cd
main.runBackup(0x0, 0x0, 0x0, 0xc0000c0cc0, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, ...)
	/restic/cmd/restic/cmd_backup.go:592 +0x865 fp=0xc002c099d0 sp=0xc002c09170 pc=0xd1b8a5
main.glob..func2(0x167dd60, 0xc00018e2d0, 0x4, 0x9, 0x0, 0x0)
	/restic/cmd/restic/cmd_backup.go:62 +0x198 fp=0xc002c09cd0 sp=0xc002c099d0 pc=0xd4fab8
github.com/spf13/cobra.(*Command).execute(0x167dd60, 0xc00018e240, 0x9, 0x9, 0x167dd60, 0xc00018e240)
	/home/build/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:856 +0x472 fp=0xc002c09d90 sp=0xc002c09cd0 pc=0x73be92
github.com/spf13/cobra.(*Command).ExecuteC(0x167cbe0, 0x40d42a, 0x1f, 0x42)
	/home/build/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:974 +0x375 fp=0xc002c09e70 sp=0xc002c09d90 pc=0x73cab5
github.com/spf13/cobra.(*Command).Execute(...)
	/home/build/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:902
main.main()
	/restic/cmd/restic/main.go:98 +0x45 fp=0xc002c09f88 sp=0xc002c09e70 pc=0xd4e665
runtime.main()
	/usr/local/go/src/runtime/proc.go:225 +0x256 fp=0xc002c09fe0 sp=0xc002c09f88 pc=0x438bb6
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc002c09fe8 sp=0xc002c09fe0 pc=0x46cf41

goroutine 18 [chan receive, 8 minutes]:
github.com/restic/restic/internal/restic.init.0.func1.1()
	/restic/internal/restic/lock.go:254 +0xb3
created by github.com/restic/restic/internal/restic.init.0.func1
	/restic/internal/restic/lock.go:251 +0x35

goroutine 6 [syscall, 8 minutes]:
os/signal.signal_recv(0x0)
	/usr/local/go/src/runtime/sigqueue.go:168 +0xa5
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:23 +0x25
created by os/signal.Notify.func1.1
	/usr/local/go/src/os/signal/signal.go:151 +0x45

goroutine 20 [select, 8 minutes]:
github.com/restic/restic/internal/ui/termstatus.(*Terminal).run(0xc00009d080, 0x10f8128, 0xc00011df80)
	/restic/internal/ui/termstatus/status.go:102 +0x129
github.com/restic/restic/internal/ui/termstatus.(*Terminal).Run(0xc00009d080, 0x10f8128, 0xc00011df80)
	/restic/internal/ui/termstatus/status.go:91 +0x70
main.glob..func2.1(0xc0000821d8, 0x8)
	/restic/cmd/restic/cmd_backup.go:60 +0x6d
gopkg.in/tomb%2ev2.(*Tomb).run(0xc0000806e0, 0xc0000ae4e0)
	/home/build/go/pkg/mod/gopkg.in/tomb.v2@v2.0.0-20161208151619-d5d1b5820637/tomb.go:163 +0x38
created by gopkg.in/tomb%2ev2.(*Tomb).Go
	/home/build/go/pkg/mod/gopkg.in/tomb.v2@v2.0.0-20161208151619-d5d1b5820637/tomb.go:159 +0xbc

goroutine 7 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc00011e500)
	/home/build/go/pkg/mod/go.opencensus.io@v0.23.0/stats/view/worker.go:276 +0xcd
created by go.opencensus.io/stats/view.init.0
	/home/build/go/pkg/mod/go.opencensus.io@v0.23.0/stats/view/worker.go:34 +0x68

goroutine 8 [chan receive, 8 minutes]:
main.CleanupHandler(0xc000075d40)
	/restic/cmd/restic/cleanup.go:59 +0x4c
created by main.init.0
	/restic/cmd/restic/cleanup.go:21 +0x67

goroutine 28 [select]:
github.com/restic/restic/internal/ui.(*Backup).Run(0xc00035c1a0, 0x10f8128, 0xc0000a0380, 0x0, 0x0)
	/restic/internal/ui/backup.go:102 +0x2fc
main.runBackup.func2(0x0, 0x0)
	/restic/cmd/restic/cmd_backup.go:566 +0x84
gopkg.in/tomb%2ev2.(*Tomb).run(0xc000080730, 0xc000394810)
	/home/build/go/pkg/mod/gopkg.in/tomb.v2@v2.0.0-20161208151619-d5d1b5820637/tomb.go:163 +0x38
created by gopkg.in/tomb%2ev2.(*Tomb).Go
	/home/build/go/pkg/mod/gopkg.in/tomb.v2@v2.0.0-20161208151619-d5d1b5820637/tomb.go:159 +0xbc

goroutine 174 [select, 3 minutes]:
main.refreshLocks(0x168bc20, 0xc00251d5c0)
	/restic/cmd/restic/lock.go:70 +0x1b3
created by main.lockRepository
	/restic/cmd/restic/lock.go:47 +0x26c

FYI: @nickvh I just edited your post to have the command output in a code box.
This makes it easier to read and find out what the problem might be :slight_smile:

1 Like

If a more aggressive GC seems to have fixed the issue, does that mean this system has no swap configured? Or was it filled up?

It just means that less memory in total is needed when restic runs. Whether the system has swap or not isn’t necessarily answered by this. There could be swap that was full when restic wasn’t running, and adding GOGC=20 made memory usage go down to within the swap limit. We don’t know, one hsa to look at the memory and swap stats to know that.

For now I will mark this as solved with your suggestion of setting GOGC=20 however i still find it quite remarkable that restic runs out of memory when starting a backup when 3.8 gb of ram is available and 2 gb of swap and it uses all of that and then crashes.

How many files does the backup includes? Is there a folder with hundreds of thousands files?

I have a similar issue when running Restic on my RPI4 backing up a folder with 200k files.

4GB, so what?

Cheap laptops have 16GB ram nowadays.

Restic has become fast with version 0.12, so unless RAM usage is +10GB over several hours, I wouldn’t bother.

the backup contains
433448 files.

it has 350 directories and most of the files are pretty spread out
Also it might be worth mentioning i have cache turned of on the restic backup

So that’s roughly 1000 files per directory? Or is there a directory that is much larger than the others, like 100k files?

How large is the index directory of the repository? Are there any particularly large files, that is >50MB, in the data folder?

1 Like

Just FYI: I run into the same problem with 8GB of memory on my host.

The GOGC=20 solved it for me too, but I also had also to use about 6GB of swap. The backup is still running, but it is now going through the files, previously it was crashnig on loading of index files.

I have 87TB of compressed video data, slightly over 17M individual files.

I can report later how long the process took if anyone is interested.

the index folder contains off 1315 files size ranging anywhere from 5.5 MB to 6.1 MB
the biggest file i could find so quickly in the data dir was 12 MB
The whole data dir is 725GB with every sub dir inside the data dire ranging at around 3GB.
EDIT: there is 1 domain that has 435000 files in its home directory. and around 10 other that range between 5000-25000

The memory usage of restic is roughly equivalent to the size of the index folder. At an index size of 8 GB there are roughly 60 million blobs stored in the repository.

So with the number of files reported so far either nearly each file must change for each backup run, there are thousands of snapshots or you’ve massively underreported the number of files which are backed up in the repository. Note that it is not relevant how many files the snapshot with the largest amount of files contains, but rather how many different files are stored across all snapshots stored in the same repository.

To resolve the memory usage problems you will have to split up your backup tasks to backup to different repositories.

2 Likes

there are more similar servers backing up to 1 restic repository. So to get the memory usage down more repositories would be the answer ? d

This decreases the index size and therefore the memory consumption. However, you loose deduplication between the repositories, i.e. identical data that is saved to two different repositories will be saved twice.