Best way to back up Google Takeout ZIP files?

seanh · January 12, 2025, 1:25pm

Hi, I’ve downloaded my Google data using https://takeout.google.com/ and I’m wondering what’s the optimal way to back this up to a Restic repository?

The download from Google is a multi-gigabyte ZIP file containing lots of files: emails, photos, etc.

When I download successive takeouts from Google over time, many of the files within the ZIPs will presumably be the same (for example each successive ZIP will contain all the same photos, plus any new ones that I’ve taken since). But each success ZIP file taken as a whole will be different from the previous ZIP.

I’m wondering if it might make sense for Restic’s deduplication to extract the ZIPs locally first and back up the extracted contents to a Restic repo? Or is it fine to just back up the ZIP files directly?

Same question applies to downloads of your Apple data from privacy.apple.com.

Thanks!

stormbear · January 12, 2025, 1:41pm

“Optimal way” needs consideration: what do you optimise for?

Backup space: Decompress the ZIPs, then deduplicate with restic. Every unchanged block will not consume space (beyond a pointer to the block content).
Backup time: Might be worth a consideration to not use restic at all. The ZIP is already compressed and a backup takes only the time to copy it as opposed to lengthy deduplication.
Laptop battery time: Ditto. No deduplication consumes power.
Easy access to individual files: Decompress ZIP, then use restic. That way you can retrieve each file without processing a whole ZIP, and can search across backups.

HTH

KissT · July 31, 2025, 6:06am

Rather old topic but just run into this hence wanted to add my take, I believe it would be in general a useful feature flag to be able to backup any zip / tar / gz etc by it’s content rather then the compressed file itself, v2 repos have compression already anyway, and the deduplication will be more handy then having the file collection compressed.
The part which makes this rather unpractical is the fact that even the takeout itself after decompressing is wrongly structured for backups, for example the e-mails are bundled into a single mbox, instead of individual e-mails, the calendar events into a single ics, rather then events, contact into a single vcf etc.
However I do believe that for Drive files and in general other incremental compressed files it would worth adding this as a feature flag, maybe called it atomic compressed backup acb ?

Note: I do not have either a Google or Apple account myself for this to be a project I would feel strongly to take under my wings, but might worth creating a small takeout processor, which does the splits and extracts into a single location which then can be added to any backup solutions sensibly.