Experience with repacking and recompressing large restic repos in GDrive

I gathered together my thoughts and experience in compressing and repacking two very large restic repos in Google Drive.

I’ve been using restic since 2018 and have been eagerly awaiting the release of the compression feature (my previous backup tool of choice was borg and appreciated that the compression helped with my measly upload bandwidth). However, have a lot of data backed up to Google Drive and was a little bit leery of upgrading my restic repos. But when Google Drive decided to roll out a 5 million file limit (Google Issue Tracker), I knew I needed to take steps to reduce the number of pack files in my Drive, as well as allow me to compress everything. So here are some stats after several weeks of restic pruning.

My first restic repo has about 128 thousand files backed up, mostly media files and other poorly compressible data, taking up 12.3 TB of storage. Because I’ve been using restic since the default pack size was 4 MB, I had over 2 million pack files in the repo. Google Drive also restricts how many API calls you can make per second, so listing files is very slow. During restic prune it would spend about 55 minutes in the searching used packs... stage. To compress, I repeatedly ran restic prune --repack-small --repack-uncompressed --pack-size 128 --compression max --max-repack-size 750G. I restricted it to 750 GB per run because Google Drive has a 750 GB upload per day limit. When the prune was finally complete, the space savings were small (as expected), and the new repo size was 12.1 TB (compressed by 1.7%), but I’m down to 98 thousand pack files, and iterating over the packs takes less than 2 minutes.

My second restic repo is a backup of 1.8 million files and a wide range of file types, including very small home directory files. This repo took up 1.6 TB of space and 350 thousand pack files. I chose to use a smaller pack-size 64 because the data in this repo is more volatile and I assume they will be rewritten more often. After running restic prune for several days, I’m down to 22 thousand pack files, and the repo uses about 1.4 TB of storage (compressed by 12.5%).

Some things I wish I had known before I started:

  1. How slow it was going to be. Downloading, compressing, and repacking the packs was pretty slow. I knew that my 40 Mbps upload wasn’t going to cut it, so early on I switched to a VPS with symmetric 1 Gbps. The measly 2 GB of RAM choked, and the OOM killer kept wiping out my restic processes. Upping to 4 cores and 16 GB RAM helped avoid that, but I was still only uploading between 8 - 19 MB/sec. Deletes were also very slow (likely due to the Google Drive API limitations) and it would sometimes take 4 hours to delete unused pack files after a run.
  2. That you can run restic rebuild-index if the repack gets interrupted and salvage some of your uploaded pack files. I had my VPS go down a handful of times and probably had to reupload about 600 GB of pack files.
  3. That Google Drive would rollback their file limitation. As I said before, I probably needed to repack anyway as the number of files was getting unwieldy for rclone. If Google hadn’t backtracked, I don’t think I would have been able to repack very easily, as the process involves uploading all the pack files and then deleting the old ones (a very sane design choice) but I would have certainly run into the number of file limit very quickly and would have been ever slower going.