Hi, I was planning on building a database to hold metadata of the files in my backup. rather than traversing the filesystem myself, if i used restic as a library (in a go program i create) could I use the restic cache or index instead? I would rather store metadata associated to blobs (is this correct terminology) rather than to filename / paths, seems less brittle, as long as I can quickly turn blobs back into their associated filepaths
it looks like most everything is in the “internal” dir, so this may not be possible without forking the repo. I think, all i would need to do is iterate on blobs, find their associated filepath, do analysis on those files, then “tag” the blobs with the information. i would use a totally separate key/value store for this metadata, the fingerprint of the blob would be the “key”.
Note that within restic it is currently not easy to find out the “associated filepath” of a given blob. A backup/restore usually needs it vice versa: save/find the blobs for a given file.
Also due to the deduplication, there might be many filepaths “associated” with one blob (i.e. files with same content or even identical parts that result in identical chunks) and also identical filepaths specified in many snapshots.
So, actually you only have one possibility to find filepaths: Loop over all snapshots and for each snapshot, loop over its trees…
@alexweiss thanks. I think I am ok with the duplication you are talking about, since if a file is duplicated, I would also want it to be tagged the same.
So it looks like if i use
restic ls [id] I can list all the files in the snapshot. and i think it looks like it is faster than if i actually loop over the backup files. Should this be true or should it be faster to loop over the actual files using the filesystem?