FileReadConcurrency=1 prevents fragmentation

Hi all,
I’m trying to use restic to backup a directory with relatively big files (~1GB per file) and I notice that the default behavior is to read two files at the same time (b2 backend, if it matters).
This might be ok performance-wise, but when I try to delete one file, a lot of repacking is needed to free the contents.

So for example, we have a folder with three 1GB files.
Files 1 and 2 are being backed up simultaneously and file 3 is backed up alone.
If I delete file 2 and try to “forget” the snapshot I get the output that about 1GB of packs need to be repacked.

I rebuilt restic with FileReadConcurrency=1, cleared the repo and repeated the test. Now, I was getting only about 50MB of packs to be repacked.

Is this the expected behavior? Is there a way to set this param via command line (didn’t see any)

Thanks!

1 Like

This is somewhat expected, we so far do not group blobs per file reader.

Regarding a command line option take a look at https://github.com/restic/restic/pull/2750 .

1 Like

Thanks for the reply Michael!
This setting only affects source file I/O, right? The backups are still uploaded in independent number of threads? In b2’s case, controlled by b2.connections?

Exactly. The upload concurrency is controlled by the backend setting. Whereas the file-read-concurrency controls the number of files read at the same time.

Interesting … would using a custom pack-size change anything? Like say you only have 1GB files use a pack-size of 256MB help with that too - or would it be counter-productive? Hm.

The larger the pack size is, the larger the probability that it will contains blobs from different files. That is, for the use case here, increasing the pack size is counterproductive.