Apologies for the reply spaghetti, and the deleted posts. I suspect I know where things went sideways in relation to being able to debug this.
Instead of creating a repo per project, I naively created a single repo at the root level of the bucket, into which i’m dumping all the projects. This may or may not come back to haunt me [gut check, I feel that it will] – this likely means that I wouldn’t be able to determine the data footprint of the packs transferred so far. I will likely need to reorganize this at some point.
[edit: with this new info I’ve decided to go back and adjust the back up script to first attempt to create a project-specific repo in the bucket – I’ve created new buckets for this purpose, so in theory once my new upload has completed (of all the projects, ugh!), I should be able to simply delete the old buckets. It feels better organized – @cdhowie thank you for the tip.]
Having said that, is there anything else worth considering with regards to why these (large?) files are being looked at by restic? Is it safe to assume that this is a likely cause of the slowdown?
[post-post edit: this experiment definitely highlighted the fact that there is something amiss. The first project being archived shows as follows:
I think if you pass -v to the backup command, it will give you a list of the files it’s uploading. You can use this to troubleshoot your exclude patterns. (Maybe try backing up to a local filesystem repository to test, until you have the patterns working the way you want.)
I set up a not-dissimilar configuration with all but one small directory excluded and I then removed the exclusions progressively as I was ready for additional chunks of data to backup.
While I didn’t monitor the file system directly, restic definitely didn’t read the entire contents of excluded directories as the initial backup finished faster than would be possible. I can’t speak to whether restic wandered the directory structure.
I’m hoping to exclude all the files inside/under the footage directory, but i’m clearly mangling the exclude string. Any insight would be most appreciated.
… which is NOT what you want. This excludes the first directory cmn/footage/From_client/footage/graded but then passes the other two as arguments to backup, which includes them.
I would suggest that you stick with --exclude footage.
@fd0 Are you aware of any issue with leading ** in an exclude pattern?
ps shouldn’t show any quotes there so it looks like you’re excluding files actually having double quotes in the name. Usually, you would get something like this if you use something like \"**/footage/**\" or '"**/footage/**"'. You briefly mentioned a backup script, can you paste it (with credentials redacted of course)?
Right, but we’re saying that --exclude="**/footage/**"should work because the shell erases the quotes. However, based on the ps output it looks like the quotes were actually passed to restic.
Were you doing something like --exclude='"**/footage/**"'?
I don’t think the shell erases the quotes. I think the shell will make substitutions if the patten falls into the category of something it knows about for example:
if I call du -sh *c* the shell substitutes the c for something it knows about – I say this because invoking ps on the du process shows:
gene [gene@tws09 ~]$ ps -ef | grep du
41185 38406 6 11:49 pts/4 00:00:00 du -sh cal cmn
instictively (and wrongly in this case) I put the expression in quotes hoping to prevent the shell from getting involved in the globbing process.
That’s the strange thing – what you did was actually right. The quotes should be erased by the shell.
The only guess I have is that you are calling restic from a script, and the shebang line in the script invokes a shell that doesn’t erase quotes, and that’s a long shot.
Ha. That’s awesome – I did not know that the shell behaves this way with quotes. Thanks for the schooling. I will watch out for that in the future.
And yes, you are 100% correct as to the script. I’m invoking restic from a python subprocess, which probably trips this behaviour up.
Ah, yup, there it is! subprocess.Popen() doesn’t invoke a shell, it takes your arguments array and passes them to the target program as-is. This means that quotes and * characters alike are untouched, so using **/footage/** without quotes here is safe.
And, of course, using quotes means they get passed to restic, which will dutifully ignore any paths that start and end with a " character, and have /footage/ somewhere in them.
Thanks for satisfying my curiosity.
Oh and look, you aren’t the first person to trip up here. I knew this sounded familiar… I helped someone else out with this exact issue!