Is it possible to split the backup of large data into smaller parts?

My most important data - mainly photos, own video production - are saved on the NAS. But I would like to keep the 3-2-1 rule. So I bought Google Drive (for curious because of the rclone mount and the price) and would like to backup (maybe archive is more accurate) these data there. From the Raspberry Pi I would like to run a scheduled backuping script to GDrive. But initial backuping means to backup almost 1,5 TB what is realy huge number.

This data is structured as follows (in square brackets is the example of dir size):

/data/foto             [900 GB]
  2000 and older       [500 MB]
  2001                 [100 MB]
  ~
  2021                 [100 GB]
/data/video/           [300 GB]
  dir1                 [10 GB]
  dir2                 [10 GB]
  dirN
/administration        [400 MB]

My first attempt was to back up these three directories together. I measured the first 5 hours and every hour I backuped around 5 GB. That means around 13 days of continuously backuping. Not a good solution with a high possibility of the problem i.e. with network flapping.

Does will this example work (in the backup example aren’t all needed flags)?

#1 part
restic backup /data/foto/200*

#2 part
restic backup /data/foto/201*

#3 part
restic backup /data/foto/202*

#4 part
restic backup /data/video/dir1

#5-N parts
restic backup /data/video/dir{2,3..N}

After finishing this initial backup I’ll run backuping regularly after a few days:

restic backup /data/foto /administration

Is it possible to split this data into smaller parts? Without the need for restructuring data saved in the Restic repository. Or is there some better procedure that I didn’t realize?

If a backup fails midway then all data uploaded up to now is not lost, but still in the repository. A later backup run will have to read all files from disk again, but won’t have to upload them again.

Your idea of splitting up the backups should work. There’s one alternative you could try, which has the benefit that the final backup run doesn’t have to read everything again. You can run the backup command with additional paths and tell restic that it should still use a previous snapshot as starting point using the --parent <snapshot-id> flag. That could look like the following. If the directories contain millions of files then your approach might be faster.

# creates snapshot 12345678
restic backup /data/foto/200*
restic backup --parent 12345678 /data/foto/200* /data/foto/201*
...

And there’s an experimental PR you could try, which should allow failed backup run to resume from where it failed:

1 Like

I tried to use the --parent trick and I’m not sure if it worked. After creating two backups I still see two snapshost. Is it alright?

Overview of snapshots:
ID        Time                 Host         Tags          Paths
---------------------------------------------------------------------------------------------------
7fc1fe9c  2021-07-16 12:58:24  raspberrypi  rpi,etc,home  /data/foto/--=2002_a_min=--
                                                          /data/foto/--=2003=--
                                                          /data/foto/--=2004=--
                                                          /data/foto/--=2005=--
                                                          /data/foto/--=2006=--
                                                          /data/foto/--=2007=--
                                                          /data/foto/--=2008=--
                                                          /data/foto/--=2009=--

9ff825ff  2021-07-16 22:54:24  raspberrypi  rpi,etc,home  /data/foto/--=2010=--
                                                          /data/foto/--=2011=--

I thought that new dirs will be added to the parent snapshot: 7fc1fe9c.

@waldauf Please include the complete command and output of restic when you ask about results from your backup runs or similar - they contain the information needed to be able to provide an answer, at least oftentimes. For example, the output is what indicates whether the parent snapshot you referenced was used or not.

Snapshots are points in time. When you make a backup, a new snapshot is created - snapshots are never modified. So no, the additional folders you backed up are not “added” to the previous snapshot.

To expand slightly to what @rawtaz said, a snapshot is never modified and the --parent parameter never has any impact on how the backups and repository look. It only makes them faster.

But looking at the output (even without seeing the commands), it does look a bit like you ran the following:

restic backup /data/foto/200*
restic backup /data/foto/201* --parent=<parent>

When you should have been running:

restic backup /data/foto/200*
restic backup /data/foto/200* /data/foto/201* --parent=<parent1>
... <repeat as neccessary>
restic backup /data/foto --parent=<parentn>

In other words, to get to a full backup in smaller pieces, you add more and more folders in each backup, while using --parent to speed up the process. You do not create multiple snapshots, each with different folders and then combine them.

(And remember that because of deduplication, you are not wasting any space doing this. Any file present in the first backup will not be duplicated in the second.)

My initial backup to Google Drive was ~13TB. Took me over a month of non-stop uploading with failures (reboots, lost connection, etc.) in between. Restic chugged along after re-connecting/re-starting the backup and I finally got there.

I wouldn’t bother breaking it up.