Restic blocksize functional principle (performance issue for e.g. onedrive?)


#1

I’m new to restic, but considering this backup software for offsite backups on my onedrive (1TB coming with office 365 plan).

I am a bit uncertain whether restic would work with onedrive reliably, as onedrive seems to have somekind of file count limit of about 80k to 100k. So I’m somehow concerned that static block size (small ones) would yield to exploding file counts on my onedrive. Whereas too large block sizes would blow the available space easily (e.g. compiling something makes sometimes hundrets and thousands of small sub-kilobyte files which would then inflate to hundrets of GB if having block sizes of let’s say 10MB).

Does Restic have blocks which are small and always the same size no matter how large a file is, prior to the backup? If so, how do you guys cope with this “issue” then?


#2

I don’t really know the full answers to the question(s), but I do want to point out something for consideration anyway:

Keep in mind that I’m not sure if it’s restic’s goal (or obligation) to handle situations like this. OneDrive, Google Drive, and similar services are not really meant for backup as much as they for file sync. A file limit less than 100k is a really unfortunate arbitrary limitation that seems to be in place to prevent this kind of use case, probably for business reasons.

I’m not opposed to a working solution for this, but it does seem like whatever solution restic implements, would have to be both:

  1. A maintenance burden
  2. A bandage that doesn’t actually solve the problem; i.e. maybe it would help backup sets that are a little larger, but not a lot larger; i.e. it doesn’t scale

That’s my guess. I might be wrong. But I just thought I’d put them here for consideration.


#3

Restic splits files into blocks, afterwards restic combines the blocks into larger files, which is what you’re interested in. So it’s not about the number of blobs, but the number of files.

The default parameters for restic are set so that you’ll end up with most files between 4 to 16 MiB in the repository. Assuming no duplication at all in your 1TiB of data, this would mean on average 1024 * 1024 / 4 = 262144 files in the repository.

It is the first time (as far I know) somebody reported a limit on the number of files, so it seems to be not many people experience this limitation in practice.


#4

Where did you see info on OneDrive 100K file limit? FWIW, I use restic with OneDrive and have not noticed any performance issues, at least no OneDrive-specific issues. There are currently 100329 “pack” files in the backup, total little over 475GB. Disclosure, I use custom restic build with direct OneDrive support (while standard restic uses rclone to talk to OneDrive). Hope this helps.


#5

Yes there is a limit of 5K files per directory in One Drive


#6

Do you have a link to 5K-per-directory limit documentation? From what I recall, there was a limit on individual file size (few gigs, which is more than enough for restic), but do not recall a limit on total number of items in a directory.

Also, restic spreads pack files among 256 directories, which with 5K per directory and 4M per file gives little over 5TB of total storage, well above OneDrive 1TB cap.


#7

I am not finding the link that i read earlier. I have myself come across this limitation several times with duplicati backups on one drive for business


#8

@ifedorenko I read about it here for example.

And here is a knowledge base article by microsoft itself, where microsoft themself mention the “limitation”

Though it sounds more like a soft limit (not a deliberately set hard limit), hence it sounds more or less like a performance related advice not to have more than 100k files on onedrive. On the other hand MS recommends to avoid these issues by using the native client instead of a webbrowser to upload files. But I assume that the rclone based onedrive connection for restic is based on a browser-like access api, this issue could affect restic too.

Anyways… I actually wanted to know how restics block system is functioning and if anybody has had any issues so far using onedrive or other cloud services altogether with restic.

I use custom restic build with direct OneDrive support

Great to read about this! Is it sophisticated to integrate the onedrive support natively into restic? Perhaps you could contribute your changes to original restic, so that the support is there out of the box?


#9

I believe they are trying to avoid adding more native backends to restic, because of the maintenance burden and code bloat, and also because restic + rclone already work great together, and works with all of rclone’s backends. And it looks like rclone is slowly making its way toward being used as a library so eventually it won’t require running a separate binary.


#10

I am not sure if my OneDrive backend provides much over rclone, and I keep mine mostly because I already had it setup before rclone was integrated. I only mentioned it in the interest of full disclosure.


#11

Restic + rclone od union :thought_balloon:

Will this help till it’s time to prune