Build a directory tree with file counts based on output of `restic diff`

n8henrie · August 3, 2022, 3:04pm

Hi all,

Before I embark on this project, wanted to see if anyone else had already done something similar.

I’m hoping to beef up my exclude lists. I’ve written a small script that takes a list of ~10 recent snapshots and restic diffs them, and with a little awk / sed I’m left with a 500,000 line file. I can sort | uniq -c | sort it to see that there are a few files that are getting frequently updated and could be considered for exclusion, but I think I’d get much better bang-for-buck looking to exclude culprit parent directories (instead of exact file matches).

The best idea I’ve had so far is to find a way to make a tree structure that when printed includes the counts of children (and sorts branches by this count) and perhaps lets me truncate the depth.

I could probably manage something just by sorting the file and using awk -F/ { stuff } to determine the depth, but I usually end up regretting it when I start using bash scripts for something not entirely trivial.

head -n 100 changes.txt | tree -a --fromfile

^^ Something like that but with file counts for each directory would work.

Some of these might be adaptable but by default are counting actual files in a directory tree instead of using something like stdin for input of filenames:

n8henrie · August 3, 2022, 8:04pm

Well, this ended up being easier than I’d expected to implement with coreutils.

Wrapped it up into a little script that sorts by count and removes anything with only 1 result (like files).

Should be pretty easy to also add in a du -sh to get sizes if one wanted. Currently it runs in <2s on that 500,000 line file on my M1 Mac. Sharing in case useful for anyone else: treecount.sh · GitHub