First time using the new build on my iMac to back up a Window’s user profile after running ddrescue on a dying disk, mounting via Dislocker, and running Restic to grab the good stuff!
Total data backed up: 272.32 GiB
Total data after dedup: 179.08 GiB - 66% original size
Total data actually written to disk (compressed): 118.56 GiB - 44% original size!
That was at auto. I intend to mostly back up using max compression, but in this case I was on a time crunch! I’m currently copying my old repository over at max compression, and it’s took all week and is a little over halfway done haha
With that in mind, it copied everything in 1:15:57. So, it read at roughly 59.72 MB/s! This was from a M.2 SATA SSD to a Fusion Drive repo (1TB SSD cache + 5TB HDD). Normally I get 300MB/s easily out of it when the cache isn’t full, but I had been copying the old repo over so it had probably slowed down to HDD speeds.
Awesome - this is fantastic!
Can someone share some information about the type of compression algorithm used and why that was chosen?
I tried to find it in the documentation, but didn’t have any luck.
The statistics already posted show that it seems to be efficient and performant, so that is great to hear.
Further to @yorkday’s question, could anyone also comment on the current state of (de-)compression parallelism?
On my system (rpi4 & NAS), auto compression gives only a minor speed hit (CPU and network load are quite balanced), while max is very CPU limited. It looks like only one core is being used, so there should be scope for speedup?
Either way, even in its current form the devs have done a great job
Restic uses Zstandard. It has a great compression rate, adds almost nothing if the data can’t be further compressed, and it is fast to both compress and decompress data.
Hmm, restic will read at most two files in parallel from disk, then compress with as many threads as there are CPU cores and afterwards write the compressed parts to the repository. So if restic is using only a single CPU core, that sounds like it’s somehow IO-bounded. Or maybe backing up lots of small files?
It’s best to follow the instructions at the end of the compression changelog entry restic/issue-21 at master · restic/restic · GitHub . As long as you copy the chunker parameters over then the repository should behave similar to a new v2 repo.
I’ll be interested to hear any feedback on performance with that option @akrabu . After a week, my current migration is now complete and rcloned off-site, so I’m not inclined to delete it and start again just to see if it’s quicker. But if you’re still working on it…
So I had let it run from 4pm yesterday on a single snapshot, and it reached 25% by 10am this morning. I canceled when I saw this information at 10am this morning, and at 3pm it’s currently at 16%. Seems like an improvement to me!
Most of the source data are pictures and media files in common. While compression = max was the CPU ( i 9700 , 8 cores ) fully loaded. In my point of view a brilliant feature !
EDIT: Tested with “restic_v0.13.0-147-g88a8701f_windows_amd64.exe”
Upon testing, it seems that prune doesn’t save preliminary indexes (like backup does). With the new use-case of repacking uncompressed data (prune --repack-uncompressed) and potential very long runtimes of days or weeks, any interruption means having to start all over. Would it make sense to add this functionality to prune? Happy to open an issue!
It was a conscious decision to disable the creation of preliminary indexes while reworking how prune works to simplify the code. With the current handling of duplicate blobs in prune, saving preliminary indexes can only increase the amount of work for prune but never decrease it. Maybe prune: Handle duplicate blobs more efficiently by aawsome · Pull Request #3290 · restic/restic · GitHub is enough to alleviate most of the problems in that regard.
So unfortunately changing prune to work incrementally requires quite a bit more work than just saving preliminary indexes. But you still can open an issue .
@dhopfm you could use --max-repack-size to limit the packs which are repacked and therefore the runtime of one prune run. This way you are able to step-by-step only repack some of your uncompressed data.
About the PR @MichaelEischer mentioned:
This PR alone doesn’t help with the problem that aborted and restarted prune runs start all the repacking from beginning: If only the “old” index files are present, all pack files created by the aborted prune run are considered as unreferenced an not needed. So they are simply deleted at the beginning of the restarted prune run.
However, you can run a rebuild-index after the aborted prune run which will more or less simulate that prune had written preliminary indexes.
Without the mentioned PR this worsens the situation as now an original pack-to-repack and the pack(s) created by the repacking it are both marked for repacking
But with this PR, prune will now choose the original pack-to-repack as completely unused and simply select it for deletion, whereas the pack(s) created by the repacking are usually selected for keeping. So the repacking work is “saved” even for aborted prune runs.
So one way would be to merge that PR and then add logic to save preliminary indexes. BTW, this is how I implemented the prune run in rustic: preliminary indexes are saved and the logic of the mentioned PR is also implemented.