Ignore file management


#1

The following thoughts are probably not for restic directly, but for frontends build on top of restic to make them more user-friendly. Sorry it is a somewhat of a dump of thoughts.

The problem

After setting up nightly backups on a lot of different machines when I have limited disk space or network bandwidth I find that the ignore file is one of the more difficult things to get right. I don’t want to be backing up something that changes frequently if that data is easily recreated if I needed to rebuild my machine. For example:

  • Google Chrome writes a lot of different files all the time, but when you have sync enabled then everything can be restored to a fresh install of Chrome on a new machine.
  • Email clients often store a lot of redundant data
  • Downloads directory
  • Trash/Recycling folders
  • Lots of other little programs that you would expect that keep local state that is trivially recreated

Backing up these folders are harmless and a backup is still perfectly correct if everything is included, but the nightly backups will take more disk space and memory bandwidth than needed otherwise.

Helping the user

What I want to know as a user is: What am I backing up and why is it taking so long. Personally, I run backups at night time and expect them to be finished in the morning when I want to start using my computers in the morning. So if I backup process is still running I will check it see what it happens to be working on at that moment. (lsof on linux & mac, FileMon on windows). Often it is something like “Oh, I ripped that dvd and then left it in my home directory”. Then I move the file to a place not included in my critical file backups and want to tell the backup program to look again. Or “That new cryptocoin is generating tons of data”. Here I add this new directory to my ignore list and need to restart the backup to skip that directory.

Summary: I need to know what is happing and be able to restart a backup in progress with changes.

the ideal GUI

If for example, someone was working on a web frontend for restic then I would like to have the following:

  • If a backup is currently running let me know what is being backed up right at the moment. This makes it more entertaining to watch and is surprisingly useful.
  • When setting up a new backup have the ability to prescan the source files and report how much data is included. Include a directory browser with sizes and a disk space visualization. Sorta like a web version of WinDirStat. (philesight is one example I find with a quick search)
  • When showing old snapshots please include the size of that snapshot (GB of new data) and the time spend making that backup. Have a way to click on the snapshot and visualize which files were included in the backup.
  • When a backup is running it would be really cool to be able to visualize which files are going to be backed up next. I assume that the file scanner in restic runs in a different thread and is allowed to get way ahead of the backup process so restic has this information if it has a way to tell the frontend about it.
  • Allow me to edit the ignore file directly. I don’t need a frontend here, but provide a textarea to make tweaks.
  • Allow me to restart a backup in progress to notice files that have been moved or changes to the ignore file.
  • Sometimes a snapshot includes a whole ton of junk files I didn’t mean to include. It would be handy to delete certain snapshots so they don’t end up being the snapshot that is kept around for 6 months.

#2

Great thoughts – thanks.

Fortunately, restic makes this easy when running interactively from the terminal. Unfortunately, restic doesn’t make this information available (the “knowing what is happening” part) in background or automated environments. @fd0 Since I am writing a backup app that adds a nice UI over restic, would you be opposed to a PR for the backup (and possibly restore) command(s) to honor the --json flag and write a line of JSON output every time the status is updated, instead of making it interactive-terminal-only? I could look into doing that at my next opportunity.

I don’t know how the new archiver works in this much detail, but I presume this wouldn’t be as accurate as just scanning as you go. What prevents your scan from becoming obsolete when files change between the scan and the backup?

This pull request adds a way to get this kind of information about snapshots (except for the time it took to make the snapshot – however my UI shows that anyway, for what it’s worth – maybe there’s a way to get it from restic too, I dunno).

Interesting; why’s this needed? What if the file is removed or something before the backup gets to it? Why not just report what is currently being backed up?

The other points I think are pretty straightforward; for example, removing certain snapshots or restarting backups.


#3

Not at all, that feature could be implemented similar to find, which already has “streaming” JSON output…