Initial B2 backup gone horribly wrong

UhtredTheBold · February 1, 2019, 10:24pm

I started backing up approx 700gb earlier this week but became suspicious when the bucket went above 700gb. It was then I discovered that I had somehow managed to run two copies of the backup script simultaneously…doh! I didn’t even think this was possible…shouldn’t it have been locked?

I killed one of the processes but unsure where this leaves me, should I leave the other process running to completion? My worry is how do I remove the unneeded data? Would the prune command do that or should I delete the bucket and start again?

Thanks for any advice.

cdhowie · February 1, 2019, 11:05pm

Backup takes a shared lock, not an exclusive lock. This is intended and permits multiple parallel backup operations to the same repository. (This may be particularly desirable when more than one machine backs up to the same repository.)

Yes.

Precisely; run prune when the backup completes and restic will discard the duplicate data.

bdillahu · February 7, 2019, 12:00am

Hmmm… how did I miss that. I wish I had realized I gave up having a shared repository as I thought I was going to have locking issues with multiple machines accessing it.

Don’t guess there is an easy way to merge two repositories to gain the dedupe?

Thanks for a great tool!

cdhowie · February 7, 2019, 4:12am

If the master key is the same (you ran restic init once, copied that directory structure, and used both repos) then yes – you can merge the two directory structures and prune.

But I’m guessing you ran restic init twice on two different directories (the sane thing to do, after all) so no, there isn’t a way to merge them today.

theBoatman · April 17, 2020, 8:52am

Just to be sure that I got that right: As long as repositories based on the same restic init structure I can freely merge them by just copiing all files into the same folder and as long as all files from one repository are there, every snapshot from that repository will work?

I am new to restic and this is a very interesting and important information for me, so it would be great if anyone can reasure that for me.

cdhowie · April 17, 2020, 1:02pm

Yep. I have some scripts that do this every day with some of our repositories, though I only copy the data and snapshots directories, then immediately run restic rebuild-index.

MichaelEischer · April 17, 2020, 7:57pm

@cdhowie Any reason why you’re not also copying the index to avoid rebuilding it? As long as the source repositories were intact, that shouldn’t cause any issues.

theBoatman · April 20, 2020, 2:19pm

@cdhowie Thanks for the input. For me it would also be interesting whether it is wise to copy the files in a certain order, as I plan to transfer them over a rather slow connection and the probability that the transfer gets interrupted is rather high. I would assume that it is best to first transfer the data directory and only after all data files are at the destination I would transfer the snapshot files. If I am right that would asure that the repository is intact all the time. Is ths correct?

cdhowie · April 20, 2020, 3:47pm

Both repositories share a significant number of packs so this would unnecessarily inflate the size of the indexes as many objects would be referenced twice.

Generally speaking, yes. However, if synchronizing is slow then you run the risk of new data being added. For example, if you are halfway through copying the data directory and a backup runs, approximately half of the data will be stored in the “first half” of the repository (packs beginning with 00-7f) and the rest in the “second half” (packs beginning with 80-ff). If your repository is constantly taking new backups, the better pattern would be to:

Sync data
Sync snapshots
Sync data again

After step 2 begins up until step 3 completes, the repository may be broken – though this shouldn’t matter much if you aren’t using it at the time, of course.

If the source repository is pruned during or between any of these steps, then the repository may also be broken.

Some tools can crawl the entire source directory tree and remember what was there. If you use such a tool and can control the order that it scans and copies, then you would have to tell it to:

Scan snapshots first, then data.
Copy data first, then snapshots, transferring only files that were present during the scan.