Rewrite source path on backup?

dkebler · August 29, 2021, 6:19pm

Is it possible to rewrite the source path into the repository when backing up.

I’m backing up a sshfs mounted directory but I want to use the actual path on that machine.

so my command is

RESTIC_PASSWORD=xxx /opt/bin/restic -r rest:https://backup.xxr.net/238gate/opt backup /mnt/238/gate/opt --iexclude-file /mnt/238/gate/opt/exclude.bac

ID        Time                 Host        Tags        Paths
------------------------------------------------------------------------
e96b1e6f  2021-08-29 09:26:27  giskard                 /mnt/238/gate/opt
c8acf043  2021-08-29 09:46:03  router                  /opt
d68ad362  2021-08-29 10:30:57  giskard                 /mnt/238/gate/opt
------------------------------------------------------------------------

the first and third snaps were made from the sshfs mounted directory whereas the second was done from a ssh session on that machine like this

RESTIC_PASSWORD=xxx /opt/bin/restic -r rest:https://backup.xxr.net/238gate/opt backup /opt --iexclude-file /opt/exclude.bac

So in the first and third case I wanted the path to be just /opt and NOT include the sshfs mount point /mnt/238/gate/opt

I don’t see any flag for this in the help.

rawtaz · August 29, 2021, 7:02pm

No, there’s no such option currently. What actual problem are you trying to solve by wanting the paths to be the same even though you back up from different systems? The decuplication in restic works the same regardless of what paths are recorded for a snapshot, as restic looks at blocks when backing up.

dkebler · August 29, 2021, 8:16pm

I’d want to have a single dedicated machine for backup with all backups done by pulling. In other words no restic binary on the source machine, no cron jobs on source machine.

I guess I’m not the first to desire the kind of setup.

To avoid this path artifact I just need a way to access a remote source but afaik there is no equivalent sftp for source like there is for target.

So that’s not a problem if I use sshfs or (other method at involves a local mount) but then I end up with this path artifact (of the mount) in the repo snapshot which is not great as that is setup dependent would be confusing and could make restoring from another path problematic.

Note: There is already a way to rewrite/replace the hostname

-H, --host hostname set the hostname for the snapshot manually. To prevent an expensive rescan use the "parent" flag

That I would need to use in the remote mount case since the host will be the backup server not the host of the source (see my snapshot output).

So looks like -P isn’t taken and that could be used for this path rewrite
maybe like so?

-P /new/base/path#/path/to/match

so in my case -P /#/mnt/238/gate/

would rewrite all /mnt/238/gate/opt to /opt

Having this option doesn’t involve restic knowing where/how the source lives but does allow one (for whatever reasons) to keep a consistent base path in the repo no matter how it is backed up. So maybe that is within the scope of this project?

rawtaz · August 29, 2021, 9:11pm

This is what I don’t see the problem in. How is it confusing, considering that you already have your snapshots separated by the hostname?

And how is restoring the files a problem - you can just restore it to a temporary folder and then move the folders wherever you want, e.g. instead of getting /my-restore-point/opt/ you get /my-restore-point/mnt/238/gate/opt/ and can simply move that opt/ folder to / or where you want it, after restoring. It’s just one extra mv command (this is the main reason I asked what the actual or concrete problem is/was).

On a related note, if you’re using a separate system to restore as well, then presumably you already want the path to be /mnt/238/gate/opt instead of /opt, no? Or are you not restoring using this separate system, only backing up?

There are already discussions about this feature, but it’s not ~~implemented~~ merged yet. I think it’s rare to see actual concrete use cases/needs for it, if I may say so. But feel free to try that PR, it sounds like it would do what you want it to

dkebler · August 29, 2021, 9:43pm

thanks @rawtaz for pointing out this PR. Looks like it’s indeed waiting to be merged so the answer to my question is yes, soon. In the meantime I’ll build with the pr and try it out.

github.com/restic/restic

backup: Add options --set-path and --set-paths-from

restic:master ← aawsome:backup-set-path

opened 07:59PM - 29 Dec 20 UTC

aawsome

+50 -6

What does this PR change? What problem does it solve? -------------------------…---------------------------- Adds an option `--set-path` to `backup` which allows to manually set the path(s) saved in the snapshot and used for finding the parent snapshot. Also the option `--set-paths-from` is added to read the paths from a file. Both options are useful e.g. if the files to backup are selected by an external tool. Was the change discussed in an issue or in the forum before? ------------------------------------------------------------ closes #2714 closes #3198 allows users to use an easy workaround for #1514 by using `--files-from-raw` in combination with `fd` (or similar find tools) and `--set-path` maybe also closes #2246 closes #1376 closes #2092 Checklist --------- - [x] I have read the [Contribution Guidelines](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#providing-patches) - [x] I have enabled [maintainer edits for this PR](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/allowing-changes-to-a-pull-request-branch-created-from-a-fork) - [ ] I have added tests for all changes in this PR - [x] I have added documentation for the changes (in the manual) - [x] There's a new file in `changelog/unreleased/` that describes the changes for our users (template [here](https://github.com/restic/restic/blob/master/changelog/TEMPLATE)) - [x] I have run `gofmt` on the code in all commits - [x] All commit messages are formatted in the same style as [the other commits in the repo](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#git-commits) - [x] I'm done, this Pull Request is ready for review

rawtaz · August 29, 2021, 9:54pm

Please do not assume that things will be merged just because they’re requested. The answer to your question is that restic does not have a way to rewrite paths. That PR will undergo review and conserations like every other piece of code that made or didn’t make it into restic. For good reasons.

Can you please answer my questions earlier? In what practical way is it a problem that you have to write one additional mv command after restoring your files (if this is even the case, considering that on that “external” backup server the paths you restore could easily match the ones you backed up)? And how is it confusing when you clearly see which host the /opt and similar paths belong to? In other words, how is this an actual practical concrete problem rather than just a cosmetic annoyance?

dkebler · September 1, 2021, 9:22pm

I cloned the pr repo, merged in all your commits since april and built restic. Then I tried it out with a bash script i’ve been building (which will eventually work into a nodejs/javascript app), and it works find for me.

I don’t really have anything to add in terms of my use case over the several comments in the issue. I agree with them.

I think the operative word used was “you” i.e. me. Yes it’s all clear if I did the backup and if I do the the restoring what happened. If I maintain a database or a yaml file (like I am using in the script) then a program can know where it really came from/goes but the snap IMO needs to reflect the actual path on the actual machine which is not an issue usless one is trying to setup up a pull backup server.

Also say I write some code to grab the json from an existing repo of snaps a program can then recreate exactly where that snap came. Otherwise without an external database associated with that repo there would be no way.

I have plans for restic. I’ve tried out several potential backup backends for a backup pull server (web) application and restic is clearly the best and I like that it is written in GO. I already have a start on a nodejs wrapper api for restic and I can now move the logic of my bash script into this wrapper

At this point if you/devs don’t/can’t merge this pr I can continue to pull recent restic commits and rebuild with this pr (the beauty of open source). I’ll just include my build as part of my app and make a donation to this project when my app is ready for prime time. I appreciate all the work and the fact that the project is active (which isn’t the case for serveral others I considered)

dkebler · September 1, 2021, 9:33pm

for reference of those interested.

my bashly based bash cli for restic which can use a yaml file for a “job”

run
BACKUP_PASSWORD=xxx backup -s /opt/backup/jobs/gate/opt.yml

generates restic commands
RESTIC_PASSWORD=xxx /opt/bin/restic -r rest:https://backup.mynetwork.net/gateway/opt -H router.mynetwork.net --set-path /opt backup /mnt/238/gate/opt --iexclude-file /mnt/238/gate/opt/exclude.bac

/opt/backup/jobs/gate/opt.yml

# actual network hostname
host: router.mynetwork.net
# alternate to host for use in snaphsot
hostname: gateway
# target:
# source path by default
# path:
# <hostname>/  by default
# mount:
source:
  mount: /mnt/238/gate
  path: /opt
server:
  # user: sysadmin
  host: backup.mynetwork.net
  #port: 9500
  secure: true

bashly yaml

name: dbackup
help: differential backup using restic
version: 0.1.0

environment_variables:
  - name: BACKUP_EXCLUDE
    help: path to file of excludes
  - name: BACKUP_INCLUDE
    help: path to directory of includes
  - name: BACKUP_SETTINGS
    help: path to default settings file
  - name: BACKUP_PASSWORD
    help: path to default settings file
  - name: BACKUP_SERVER
    help: URL of Restic rest server
  - name: BACKUP_DIR
    help: Backup Directory
  - name: BACKUP_MOUNT_POINT
    help: mount point of source if mounted external to host

args:
  - name: source
    help: source directory to be backed up, default is $PWD
  - name: target
    help: "target directory for backup"

flags:
  - long: --password
    short: -p
    arg: password
    help: repo password (or file path) for backup repository
  - long: --remote
    short: -r
    help: backup to a remote machine
  - long: --server
    arg: url
    help: url of restic rest server
  - long: --init
    help: initialize repo (default is backup)
  - long: --snap
    help: list repo snapshots
  - long: --view
    short: -v
    help: mount snapshot for viewing (default is BACKUP_MOUNT or /opt/backup/view)
  - long: --view-path
    help: set custom mount point path for viewing of snapshot, --view not required if set
    arg: path
  - long: --prune
    arg: prune
    help: prune repo (default is backup). true for default prune or path to prune settings
  - long: --password
    short: -p
    arg: password
    help: repo password (or file path) for backup repository
  - long: --settings
    short: -s
    arg: syaml
    help: path to settings file (yaml).  Keys are same as long
  - long: --host
    short: -h
    arg: thost
    help: host on remote to target to receive backup
  - long: --shost
    arg: shost
    help: remote to host of source
  - long: --user
    short: -u
    arg: tuser
    help: user on remote host
  - long: --suser
    arg: suser
    help: remote user on source host
  - long: --sshcfg
    arg: sshcfg
    help: path to sshcfg file
  - long: --options
    short: -o
    arg: options
    help: additional options  (restic or rsync)
  - long: --include_file
    short: -i
    arg: include
    help: include file
  - long: --exclude_file
    short: -e
    arg: exclude
    help: exclude file
  - long: --dir
    short: -d
    help: append source directory path to target directory

examples:
  - backup -p password . /target/dir
  - backup -s  /path/to/settings/yaml/file

bash for backup bashly generate

#!/bin/bash

echo "running root command"

inspect_args

# module_load ssh
module_load confirm
module_load path

local settings=${args[--settings]}

if [[ -f $settings ]]; then
    echo loading settings file $settings
    module_load yaml
    eval $(parse_yaml $settings "s_")
    echo $s_source
    echo $s_target
    echo $s_host
    
fi

if [[ $s_server_host ]]; then
    s_server="http$([[ $s_server_secure ]] && echo "s")://${s_server_host}$([[ $s_server_port ]] &&  echo :${s_server_port} || echo "")"
fi

local password=${args[--password]:-$BACKUP_PASSWORD}
password=${password:-$s_password}
[[ ! $password ]] && echo restic requires a backup repository password, exiting && return 2
password="RESTIC_PASSWORD=${password}"

local server=${args[--server]:-$BACKUP_SERVER}
server=${server:-$s_server}

local hostname=${args[--hostname]:-$s_hostname}
hostname=${hostname:-$s_host}
hostname=${hostname:-$HOSTNAME}

local backup_dir=${args[--backup_dir]:-$BACKUP_DIR}
backup_dir=${backup_dir:-$s_backup_dir}
backup_dir=${backup_dir:-"/backup"}

local smount=${args[--source_mount]:-$BACKUP_SOURCE_MOUNT}
smount=${smount:-$s_source_mount}

echo smount: $smount $s_source_mount


local tmount=${args[--target_mount]:-$BACKUP_TARGET_MOUNT}
tmount=${tmount:-$s_target_mount}

echo tmount: $tmount $s_target_mount

local source="${args[source]:-$s_source}"
source="${source:-$s_source_path}"
source=$(echo "${source:-$PWD}" | tr -s /)

echo yaml source $s_source_path  $s_source_mount
echo source $source

echo target $s_target

echo target path $s_target_path

local target=${args[target]:-$s_target}
target=${target:-$s_target_path}
target=${target:-$(echo "${source}" | tr -s / | sed -e "s#^[.]##")}
target="$(echo "${target}" | tr -s /)"
echo "target> $target"
if [[ ${tmount} ]]; then
    target="${tmount}${target}"
else
    target="/${hostname}${target}"
fi

if [[ $server ]]; then
    target="rest:${server}${target}"
fi

if [[ $smount ]]; then
    setpath="--set-path ${source}"
    source=${smount}${source}
fi

local exclude=${args[--exclude_file]:-$BACKUP_EXCLUDE}
exclude=${exclude:-$s_exclude}
exclude=${exclude:-"$source/exclude.bac"}


local shost=$([[ ${args[--shost]} ]] && echo ${args[--shost]}::)
local suser=$([[ ${args[--suser]} ]] && echo ${args[--suser]}@)
local thost=$([[ ${args[--host]} ]] && echo ${args[--host]}::)
local tuser=$([[ ${args[--user]} ]] && echo ${args[--user]}@)

local options=$(echo ${args[--options]} | awk '{gsub(/\\/," ")}1')

local bin=$(command -v restic)


local cmd=${args[cmd]:-"backup"}


echo before exists exclude: $exclude

exclude=$([[ -f $exclude ]] && echo "--iexclude-file $exclude" || echo "")

echo source: $source
echo target $target
echo exclude: $exclude

# local ssh="--remote-schema \"ssh -C %s /home/sysadmin/.local/bin/rdiff-backup --server\""

#cmd="$sudo rdiff-backup  $options $exclude $ssh ${suser}${shost}$source ${tuser}${thost}$target"

local sudo=""
local pcmd="${sudo} ${password} ${bin} -r ${target}"

local cmd="${pcmd} -H ${hostname} ${setpath} backup ${source} ${exclude}"

if [[ ${args[--init]} ]]; then cmd="${pcmd} init"; fi
if [[ ${args[--snap]} ]]; then cmd="${pcmd} snapshots"; fi
if [[ ${args[--prune]} ]]; then cmd="${pcmd} prune"; fi
if [[ ${args[--view]} || ${args[--view-path]} ]]; then
    mount=${args[--view-path]:-$BACKUP_MOUNT}
    mount=${mount:-"/opt/backup/view/"}
    echo view mount point $mount
    if [[ -e ${mount} ]]; then
        cmd="${pcmd} mount $mount";
        echo browse files at $mount/snapshots/latest${source}
    else
        echo $mount:  directory for mounting snapshot for viewing does not exist.  Create and try again
        return 3
    fi
fi

echo $cmd
confirm run this command? || return 1
eval $cmd

# sudo chown -R $USER:$USER $target
# sudo chown -R $USER:$USER /home/$USER/.cache/restic
# sudo chmod -R g+rwX /home/$USER/.cache/restic

dkebler · September 2, 2021, 12:06am

Well, it looks like it didn’t work as expected. “set-path” set the path property to /opt per the snapshot report (and json) but the actual snapshot of /opt was written within /mnt/238/gate.

so looks like either set-path is not working as intended or it was never intended/able to rewrite the path only to set the path property key of the snapshot.

looks like either way

nyuszika7h · September 26, 2022, 10:17pm

I agree this would be nice. My use case is that I’m backing up an LVM snapshot, and I’d rather strip the /mnt/snapshot prefix from the backups because it’s much cleaner that way. And for obvious reasons I don’t want to restore to the snapshot directory. Sure, I can specify a different path to restore and mv it afterwards, and that would be technically instant in my case as it’s on the same filesystem. But consider the case where someone is backing up multiple filesystems in one backup, then the move would add an unnecessarily slow copy/delete process.

Also, if you happen to be backing up the same machine from multiple different paths (in my case, I’m trying to import some old tar backups into restic but I don’t want that to clash with my automated daily backup cronjob, so I backup from a different path). If you restore that manually it may not be a big deal to move it into the right place, but if you want to write any sort of automation with it, it’s much nicer if the restore software doesn’t have to deal with figuring out where the real root of the backup starts.

I feel like @rawtaz’s tone in this thread was uncalled for, considering OP seemed to be respectful and didn’t act entitled. They weren’t demanding that it must be implemented. As they said the beauty of open source is that it’s possible to implement it yourself in a fork, but it’s always nicer if something can be included in upstream and more people can benefit with it without an unnecessary fork having to be maintained, as long as it’s reasonable and doesn’t cause too much burden on the upstream maintainers.

rawtaz · September 26, 2022, 10:58pm

Unless I’m misunderstanding you, that’s a pretty good explanation of a use case that makes sense.

I’m curious if this is an actually common use case though, but fair enough!

I think there’s been a misunderstanding. But if I did come across in a bad way then my apologies for that. The OP suggested the PR was sitting there waiting to be merged, while in reality its next step would be a review (which is quite different), that’s why I wrote the comment about not assuming a merge or similar.

I also did re-read everything I wrote and honestly I don’t see anything hostile. It was mostly stating facts, some helpful pointer to a PR and of course a bunch of repeated questioning (which IMO is due to OP not actually answering the questions, that’s why I kept asking). For my educational purposes, which part(s) of what I wrote did you think had a “tone” to them that was not just neutral facts/questions? Thanks.

I wholeheartedly agree with this. The more we can keep software projects together, the better. Totally with you on that!

To comment on stripping path components/prefix, IMO this is something that has been asked enough times that it warrants serious consideration. Whatever the use case is, it’s apparently something that people would benefit from, regardless of whether it’s effectively just for cosmetics or if it’s to solve an actual real/practical problem.

FeltrinN · November 16, 2022, 5:26pm

Hey, I ended up in a similar situation when migrating from a different backup solution (I restored the backups I wanted to keep on an external drive, backuped them to restic from there so now the migrated ones have paths like /run/media/user/whatever and the new ones /home/user).

I might be wrong here, but my impression is that restic forget groups snapshots by path, so that if I have a rule to keep only the last 2 snapshots it will keep the last two for /home/user and the last two for /run/media/user/whatever, which is not what I am trying to achieve… there might be some way to work around it by using different rules on different snapshots selected by tag, but IMHO it would be nicer to be able to correct the path for the imported snapshots and manage everything seamlessly

doscott · November 16, 2022, 11:25pm

Check the forget option
–group-by host

FeltrinN · November 20, 2022, 9:39am

@doscott Thank you very much for your suggestion, it helped in my case

I was thinking that there might be a similar use case where the host is not the same (e.g. migrating from one machine to the other while changing username, so that both the the host and the path would be different). I guess in that case you can use --group-by tag instead, after ensuring that all relevant snapshots have the same tags, or create the new snapshots with --host new-hostname.

I’d say renaming the paths is not really needed, but would still be a nice-to-have feature if it’s not too hard to implement

Anyway, thank you very much for the hint, my individual problem is solved!

HeikoSchlittermann · March 25, 2023, 8:09am

I would like to continue this thread. Currently I am working on a patch that allows the following:

restic backup --map-delimiter=: /tmp/mnt443221/rootfs:/ /tmp/mnt443221/home:/home

Before the backup I create snapshots of the volumes to be backed up with LVM, then mount them and want to back them up but under the original name. To me this seems a perfectly valid scenario. What is wrong with implementing it?

MichaelEischer · March 25, 2023, 11:37am

The discussion for such a feature should happen on Github. In fact, there are already quite a few issues there like Backup option to remove a leading path prefix · Issue #2092 · restic/restic · GitHub or backup: Add options --set-path and --set-path-from by aawsome · Pull Request #3200 · restic/restic · GitHub .

Nothing, the short answer why it isn’t implemented is time. There have so far been several suggestions how the corresponding CLI options could look like, but not enough time on the side of the maintainers to decide how it should look like.

v77 · March 27, 2023, 5:37am

I use Linux mount namespaces to work around this. My restic wrapper takes ZFS snapshots of all pools, does a unshare(CLONE_NEWNS), sets private mounts, sets up the sandbox with snapshot mounts, bind mounts what needs to be read-write like caches, tmpdir, log dir, does the pivot_root dance to create a mirror view of the real filesystem and runs restic as per usual. This way paths show as they really are in the sandbox and backups are atomic and don’t run off the live changing system. A namespace alone isn’t enough as a bind mounted path in child ns disappears if the source file/dir is unlinked in the parent. E.g., if you bind-mount a file in child ns, edit and save with vim in parent, it will disappear from the child ns as vim unlinks and renames on save. By running off snapshots which are read-only also in the parent ns the sandbox is shielded from such behaviour.

I bind mount /dev and /run into the sandbox rw to make journal logging work seamlessly and to be able to notify systemd service manager as the wrapper is of type notify. But as it handles alll sandboxing itself, it runs just as well without systemd and I don’t need to rely on PrivateMounts=yes or other sd sandboxing.

I don’t unshare user as I don’t want to deal with uid/gid mappings and want to ensure backed up files have correct permissions. This means running as root which is fine by me as the system is shielded by the sandbox. Snapshots are destroyed on exit and all mounts torn down automatically by kernel leaving no trace of anything behind.