Working with compression

uok · June 28, 2022, 8:31am

Some questions for migrating a big repo (500+ GB)

what do I need to migrate repo to new format and compress existing repo?
just run migrate upgrade_repo_v2 and prune --repack-uncompressed?
where can I see if compression is active for repo?
how much space is needed for migration? or is it done in place?
is it possible to define compression level for migration?
does migration need a lot of RAM or just time/CPU?
can I continue migration later if it is interrupted (e.g. disk full)?
is it possible to use repo for backups during migration?
can you change compression level later? and repack with new level?

Thanks!

akrabu · June 30, 2022, 5:05pm

Yep. Migrate will run a check and do it’s thing.
I think this is still being worked on. I do a “restic stats --mode raw-data” then compare with the actual file size on disk. Also any time you run a command on the repo, it’ll say v2 - and if it’s v2, compression is always on UNLESS you specifically tell it not to use it.
That’s a good question. The migration itself was pretty quick - I don’t think it really touches any of your packs, just the indexes and some metadata?? I wouldn’t expect a lot of disk usage for this part.
It’s unclear to me if --compression max has any effect at this stage, but I did it, and it didn’t hinder the operation. The only thing it would affect would be the indexes and metadata, so not a huge savings anyway, most likely.
I wouldn’t say a lot. I migrated a 6TB and 3TB repo, and both were done in 10-20 minutes. RAM usage hovered between 2-4GB.
“Disk full” might be cause for concern. There is the --unsafe-recover-no-free-space option but, the name says it all. Wouldn’t risk it. I’d use --max-repack-size size 250G for example and do it in batches. It’s my understanding that this process is NOT resumable. But if you break it up into chunks, in a way, you’re “saving state” once it gets done, at least compared to doing the whole thing at once. THIS step took me a good month for the 6TB repo (sporadic, not sustained - I needed to use it for backups periodically). It took about 6 days for the 3TB repo. I did the 6TB repo in 500GB stages (local) and the 3TB repo in 250GB stages (Backblaze). The latter I scheduled at 6pm every night and it ran 'til ~6am every morning. It just finished today actually, on the final 170GB that was left. I’m unsure about the exact original size, but my bill is projected to be $7/mo for storage now, instead of $16/mo . Though I have about $23 in download fees haha. But anyway, if you want it somewhat “resumable” just break it into --max-repack-size chunks, and schedule them one after another if you like, or just at night like I did (if you want to be able to back up in the daytime).
Nope, but the initial migration is super fast. You’ll need to update all the clients anyway. Afterward, you can break up the prune compression job into batches, as noted earlier, and you’ll be able to backup in between jobs.
It is my understanding all you can do is --repack-uncompressed, NOT change the compression level. I myself have been wondering this

Bold things are what I’m unsure about and could use clarification from someone more knowledgable than I.

MichaelEischer · June 30, 2022, 8:17pm

prune by default just compresses the indexes and the metadata. prune --repack-uncompressed will also compress all other data, although you should probably combine this with --max-repack-size.

The migrate command will only change the repository config, which is not compressed. Only prune will rewrite actual data.

There’s no support to change the compression level. From what I’ve seen so far there’s not too much of a difference between them. And as fallback you can still use the copy command to copy files to another repository using maximum compression.

uok · July 1, 2022, 1:13pm

Thanks for the helpful answers!
I use restic together with Syncthing, which is a dream software combo

Does using compression make restic (backup, check, etc.) faster?
Is it necessary to use --compression parameter every time or only if different compression level, e.g. “max”, is needed?
If you use --max-repack-size=20G does this also limit cache usage to ~20G?
It would be nice if restic outputs compression ratio with the end statistics

MichaelEischer · July 2, 2022, 3:22pm

Compression increases the CPU usage, but on the other hand tends to reduce the amount of data to upload / download. Unless you have a high-speed network connection, the network bandwidth is usually the bottleneck, in which case compression will speed up these commands.
The repository does not store any compression configuration. That is, restic will always default to --compression=auto for v2 repos unless specified differently when running a command.
Both are unrelated to each other. The cache only contains directory metadata and the repository index. The --max-repack-size parameter on the other hand limits the total amount of data rewritten. This also limits the amount of rewritten metadata to 20GB, and thereby sort of limits how much the cache can grow temporarily during prune.
Improve stats by fd0 · Pull Request #3733 · restic/restic · GitHub will add some compression statistics to the backup command. Although you’d have to calculate the compression ratio yourself.