Migration from locally staged repository to cloud only

I have removed my locally staged copy of my repo which I used to sync offsite with rclone. I don’t have an easy way to quantify the traffic on this specific host and therefore am wondering what kind of additional traffic to expect during a remote prune operation in comparison to locally pruning and then syncing.

It has certainly taken a lot longer which I attribute to the latency for individual calls.

The stats I see during pruning are:

restic prune --verbose
counting files in repo
building new index for repo
[52:35] 100.00%  115565 / 115565 packs

repository contains 115565 packs (613117 blobs) with 526.882 GiB
processed 613117 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 6 snapshots
[0:02] 100.00%  6 / 6 snapshots

found 324219 of 613117 data blobs still in use, removing 288898 blobs
will remove 0 invalid files
will delete 44553 packs and rewrite 14417 packs, this frees 236.695 GiB
[4:05:53] 100.00%  14417 / 14417 packs rewritten

If you are using a remote storage and are concerned about traffic during prune, you should give the latest beta versions a try where prune has been especially optimized for this case.

Note however that this is beta. Although it has been tested quite successfully by various users, there is a higher risk of unknown bugs in it.

In the optimized prune basically only the repacking is responsible for traffic: It downloads the packfiles-to-repack and uploads the newly generated packfiles. There is the now option --max-unused which allows you to repack less if you allow unused space in the repository. This may even lead to only deleting packfiles and not repacking any, which means almost no traffic.

If you however want/need to repack and do have a local copy of your repository, pruning locally and then syncing is always better as it saves you from the downloading step.

Hi,
I have fetched the latest beta from today, the compiled in help suggests the following:

--max-repack-size size    maximum size to repack (allowed suffixes: k/K, m/M, g/G, t/T)
--max-unused limit        tolerate given limit of unused data (absolute value in bytes with suffixes k/K, m/M, g/G, t/T, a value in % or the word 'unlimited') (default "5%")
--repack-cacheable-only   only repack packs which are cacheable

I am not certain I understand the range, if the pack size is fixed, are the suffixes just generic, how does gig or larger apply here?

Do I understand that 100% or unlimited would be equivalent and produce the result I want? This repo is composed of block device backups, so it seems to be a reasonable approach to simply allow the entire pack file to remain while it contains any data in use.

Thanks!

The parameters do not apply to one pack file, but to the total within the repository. If you specify --max-unused=1G this will allow 1GB of unused data distributed over as many pack files as is “optimal” to minimize repacking.

restic will still chunk your block devices into blobs of usually sizes around 1MB (this is not fixed, but depends on the content as this is a content defined chunking algorithm). Of this blobs there are usually 3-5 saved within a pack file.

I advise you to just play around with different values and see what prune would do (it tells you exactly how much byte it will repack and how much to free during this repacking). There is also the option --dry-run which can be used to just show this without pruning anything.