Hi fd0, and hi everybody.
I want to thank you for the amazing job you are doing. I consider restic a very beautiful piece of software, it works like a charm with rclone (wich I really love, too), and I’d really love to throw away my current setup (duplicity with a custom rclone bridge writtem by be, putting data on Google Drive) in favor of restic.
I have been playing with restic for some weeks, and I will run in parallel to my current setup for some months before I will feel confident enough to make the switch.I currently keep backed up about 4.5 TB of data; my data is almost static, a file can be added or deleted, but almost never changes, so I don’t actually need deduplication, but I love the modernity of the approach, and the outdoing of “full/delta” concept.
Of course, with this huge amound of data, pruning is an issue. I did some studying on the forum, and I think to have the basic comprehension of why pruning is such a delicate operation and why is slow. But I could not figure I would be so slow. When doing my first backup, after 1.2TB of data I got a network failure, and I aborted the operation, so when the networking came back the first thing I did was a prune, to remove files from the partial snapshot.
I am running restic 0.9.0 on FreeBSD 11.1-RELEASE-p10. Those are the logs.
[root@DarkSun:/store/maintenance]# RESTIC_PASSWORD=xxx RESTIC_REPOSITORY=rclone:drive:restic restic --cache-dir /store/maintenance/restic_cache --cleanup-cache prune -v
repository 1cf05352 opened successfully, password is correct
counting files in repo
building new index for repo
[27:45:20] 100.00% 336964 / 336964 packs
repository contains 336964 packs (1421154 blobs) with 1.565 TiB
processed 1421154 blobs: 0 duplicate blobs, 0B duplicate
load all snapshots
find data that is still in use for 12 snapshots
[0:05] 100.00% 12 / 12 snapshots
found 1019138 of 1421154 data blobs still in use, removing 402016 blobs
will remove 0 invalid files
will delete 70942 packs and rewrite 0 packs, this frees 312.236 GiB
counting files in repo
[21:32:34] 100.00% 266022 / 266022 packs
finding old index files
saved new indexes as [c780b3d3 9ad67223 23a9c79d a6ff3774 ff21f497 81996b90 c346afed 48aac91d ec6c72e4 efd14db2 cc926495 eb2e0f91 59bf14e5 61f5fb4a f58d0f82 7d0e0309 b84dd356 38cf46e2 c248140a ca8786aa 36addf68 1fe7553c 580f9b1a 0d752b86 aabc2d06 c9e506f3 d4e3a800 3c5da5ae 1239311f 0fa8dc05 ab69b783 b35ce4fd 3d93d4db 06d34bbb 70e049b0 db81bebd f435a0b3 6a7900bc 021470a9 81b881be 570e22e6 c46c0e0f aae66188 f1f60331 f466b4c3 94434465 920b577e bace6e8b 0c9384cf a5c4ecd3 603af255 07e72bc7 ab3db367 24cd5376 b6d815e0 87da90e1 26368d10 297c9fdb 59eaf46a 2f0a72f4 170ab129 14c3c58f e0302184 2e512240 5d6b8d30 6367e661 1975020f f108218b b5ee383d 52118fa4 a2e1f795 9373d54b 74b546e9 b456ba9f 8b9e0126 5cf8053b 47cdb8ff d222531a 494dc525 605f4e4e f22e91d6 214ea60d c2cbcb9c 719a1ab8 7ae2d02c f0fb3447 86b901c6 d5402d4b ea02f4f8]
remove 666 old index files
[14:49:05] 100.00% 70942 / 70942 packs deleted
done
Notice the 27 hours step to rebuild the indexes, another 21 hours step, and a final 14 hours step to actually delete files.
In a such scenario, I thought that prune would actually be pretty fast, I assumed that should be simple to detect ‘orphan’ packs, that do not belong to any snapshot, but seems it’s not the case. And I clearly can’t understand what happened later.
A few questions (some because I am probably biased from the knowledge of other software):
- Why restic has to scan all the packs? I assumed that such operation should rely on indexes, that are only a few megabytes, and can be kept in the local cache. Since prune is so delicate, you may not trust local cache and download it fresh, but this is still a small fraction of whole data.
- What is restic actually doing in this step? My monitoring tools did not record any significant network activity in those hours, nor CPU time. We know Google Drive is very slow with lots of operation on small files, but seems restic is not really downloading anything.
- If restic rebuilt the whole index in the first step, and correctly identified the 70942 packs / 402016 blobs to remove, what is the purpose of the 21 hours step? In this step, like the first, I noticed almost no network activity, so what is it actually doing?
- The last 14 hours step it not really anybody’s fault, just Google is damn slow in deleting lots of file. I’d really love a configurable chunk size (and I’ve seen some feature request), since in my use case I don’t care about loosing some deduplication in favour of having 1GB files, or even bigger, on remote remository. I really look forward this feature.
- Doing some test on local storage, it seems that ‘forget --prune’ it’s just an alias for ‘forget && prune’. This means that every time a crontab job will forget some snapshots older than retention time, restic will do this 3 days process? This is a problem for me. But since restic knows what is deleting, can the pruning process be more efficient if done while forgetting, rather than doing is later on?
I wish to say that this post is not anyway a criticism. I just want to understand better the internals and maybe give some hints to make restic more suitable for everybody (I am a software engineer myself, but in a totally different sector).
You are doing a great job and I really appreciate it! Thank you, and thanks if you will answer my question, and help me.
Cheers