Very deep structures cause high CPU load and sluggish scans

cdhowie · October 13, 2018, 12:56am

I have a bit of an edge case here, so I’m not expecting a “well yeah, just do ___” or “we’ll optimize that case” response, but I wanted to document that I ran into this issue to potentially help others out. And, if it’s a worthy enough case to be fixed, great. If not, I understand, because the edge case is a dumb edge case.

We have a system where my goal is to generate a backup that can be restored entirely from the restic backup, with minimal work afterwards (standard fare such as reinstalling the bootloader). (Incidentally, a proper fix for keeping directories under a path but not any files would also help towards that end. But I digress.)

There’s a third-party process that runs on this server that pulls data from an external source, and we’re not at liberty to change how that process organizes its data – which is admittedly a silly structure. There’s a directory for each single file, and between that directory and the file there are ~130 other directory levels. Yes, 130. That’s not a typo. Each file has about 130 ancestor directories that contain nothing else.

Anyway… restic utterly chokes on this during the scanning phase and I’m not entirely sure why. There were about 3,000 of these files, so we’re talking somewhere in the neighborhood of 390,000 directories to store 3,000 files. (Have I said yet that this structure is dumb? If not… it’s dumb.)

Once it reaches this directory, restic starts using every bit of available CPU the box has and discovers about five of these deep files every minute!

What I don’t get is why it burns so much CPU during the scanning phase. During actual backup I can understand, as we presumably have to generate 131 or so objects in the repository for each file. But that’s still not… a ridiculous number of objects… right? Why would the scanner be the part of this process that’s choking? While the structure is ridiculous, it surely cannot be that CPU-intense of a process to descend 130 directories deep?

Or is the scanner throttling its results until the actual backup process clears out some of the queue, and the rampant CPU use is just restic trying to spit out all of the encrypted subtree objects?

For now, we are just ignoring this entire tree of files. If there is a workaround, I would love to hear it. Right now, all I can think to do is have the backup script tar up the directory structure first, then just include that tarball in the backup.

fd0 · October 13, 2018, 11:40am

Huh, very interesting edge case. I did not expect restic to spend so much CPU… If you suspect the scanner, you could try commenting out this line to disable it:

github.com

restic/restic/blob/4ed10239ad95193f948c7173005d18bcb66a20d4/cmd/restic/cmd_backup.go#L483


		targets = []string{opts.StdinFilename}
	}


	sc := archiver.NewScanner(targetFS)
	sc.SelectByName = selectByNameFilter
	sc.Select = selectFilter
	sc.Error = p.ScannerError
	sc.Result = p.ReportTotal


	p.V("start scan on %v", targets)
	t.Go(func() error { return sc.Scan(t.Context(gopts.ctx), targets) })


	arch := archiver.New(repo, targetFS, archiver.Options{})
	arch.SelectByName = selectByNameFilter
	arch.Select = selectFilter
	arch.WithAtime = opts.WithAtime
	arch.Error = p.Error
	arch.CompleteItem = p.CompleteItemFn
	arch.StartFile = p.StartFile
	arch.CompleteBlob = p.CompleteBlob