Backup from new machine re-uploads all data

Heya everyone,

I have a script that uploads nightly snapshots to B2. All told there are several terabytes of data in play, spread across several ‘projects’.

Nightly, this amounts to a much smaller amount of data since only a small subset of this data changes.

I’ve recently changed the machine on which this backup script runs, and noticed that it treated all files as new (it re uploaded all [several TBs] the data) - in spite of the cache directory being a shared NFS mount – and available on the new machine. I was surprised by this.

Is this the expected behaviour? If so, is there a way to disable that?

Our datastore is a 100TB NFS mount – and ideally, the snapshot would be run identically from any machine.

Best,

G

Can you confirm that the data was actually re-uploaded?

More likely, restic could not find a suitable parent snapshot because the hostname and/or backup path set changed. The parent snapshot is used to compare file metadata so that unchanged files can be quickly skipped instead of needing to be read and hashed.

The most likely outcome of what you have described is that restic read each and every file (so it took much longer) but the deduplication mechanism would still have uploaded only new data.

In the future, you can tell restic to use a specific snapshot as the parent using the --parent flag. Note that if the backup paths changed, this might not help at all as metadata is compared between files at the same path.

1 Like

I’ve just run into this same issue. New Windows PC, new host name, the source data is all being read in again, but nothing is being written to the target. I stopped the backup because I wasn’t sure why it was taking so much longer than usual.

A few questions:

  • Is it best to just let it run, so it can cache whatever is required? It’s only 450GB of data, but this disk set is fairly slow at 30MB/sec so it will take a while.
  • Where is the cached data kept? In the repo, or in the restic cache in the user dir?
  • If I use the “parent” flag that is that @cdhowie mentioned would I use it once for the first backup from this machine, or would I have to use it every time I run this backup? In that case it’d probably be better just to let it run

That is the expected behavior. With a new host name restic probably doesn’t find a parent snapshot which it could use to skip files. Setting the --parent flag would only be necessary for the first backup run, afterwards restic has a snapshot for the new host and can use that as a starting point.

The cached data is really only a cache of files that also exist in the backup repository. It’s only use is to avoid downloading tons of small file parts from the backup repository over and over again.

This is exactly the use case that --parent is designed for. Use it just for the first backup, then later backups will not require it to find the appropriate parent snapshot.

1 Like

Perfect, thanks CDHowie and MichaelEischer. I’ve run it and it took about a minute, instead of hours. Thanks!

I’ve migrated most of my backups to Restic now. I still have another commercial product as my second line backup, but Restic is my main backup tool on Windows and Linux.

1 Like

I needed to do this again and forgot to write down the details. For future reference here’s the commands to use to adopt a backup from a new PC

* Get the ID of the latest snapshot using the snapshots command

restic.exe --repo D:\RepoPath snapshots

* Insert the snapshot ID where you see XYZ below

restic.exe --parent XYZ --repo D:\RepoPath  backup C:\DataToBackup