Group files by age - Like a generational garbage collector

Thanks for reporting suggestions on how to improve restic! While this is a nice idea in theory, I think it’s not as good in practice. Several things come to my mind:

  • Sorting the files by modification date before processing them during backup is hard to do, since restic walks the paths given on the command line in depth-first order. Only ordering files in a single directory seems pointless.
  • Restic splits the files into one or more chunks, and afterwards the chunks are completely independent from the files and restic doesn’t care any more where a particular blob came from. Tracking this information will be expensive (in terms of memory)
  • A blob may be referenced by many different files, which may be completely different in terms of age
  • The first run of prune will probably mix the blobs together anyway, except if we take special care (which will be memory-intensive again
  • For the very first backup, this strategy seems a nice idea, but what about subsequent backups? Most blobs will already be in the repo, and all files newly added will have a recent modification date, so then we can just carry on as usual?

This does sound very negative (I’m sorry for that), but I don’t think it’s a good idea to improve the packing algorithm at this point, before we do some major cleanups and restructurings :slight_smile:

FYI, I have a restic repo containing the extracted Linux kernel sources for each revision starting at 2.2 or so in a snapshot, it’s about 50GiB in size.