Finally I’m on vacation, so what’s better than wasting days into backupping my NAS?
The title had “~50 hours” but I’ve discovered an error in my configuration: I’ve setup wrong the restic cache location: I meant to use the same Disk A, but I actually setup an internal SSD, thus I wasn’t actually backupping “Disk A → Disk B” but I was doing “Disk A → (Read cache from Disk C - SSD) → Disk B”
That was KILLING backupping performance. After I moved the restic cache into a 1G ramdisk (or use no-cache as @Mic told me) I got TOTALLY different performance.
I’ve took my time to do performance benchmarks, that I can report here just for curiosity so maybe can be useful to someone else.
The read concurrency defaults to 2
which appeared to be the sweet spot for HDDs during experiments several years ago. You could try to reduce the concurrency of writing data to the destination disk using -o local.connections=1
.
I’m not sure how I could set -o on backrest, so I leaved this out from my tests, but I benchmarked read concurrency 2 with 1, as you can see below.
As both disks are local, you can also disable the cache using --no-cache
. The cache is most important for remote backends.
I would TRULY know more about this, because I saw from other post that cache can also grow a lot (several dozen of GBs) so I’m curious on what happen if I don’t create it, especially if I’m backupping several TBs: shoudn’t consequential backups take more/a lot more? (I don’t want to find this by myself weeks from now)
Please check whether the environment variables were applied successfully. The easiest is probably to look for files with size of about 128MB in the data folder of the repository.
Also because this, I moved from using ENV VAR into simply using job flags (is easier to benchmark then this way). Also seems that pack size is applied as I thought: if I don’t set anything, most folders inside “data” are 17MBs/50MBs (seems that actually packsize isn’t the upperbound of how heavy can be each folder). If I set 128MB (the max that restic allow me, at least the version I’m using) each folder is 128MB and can reach up to 500MB.
I guess it’s intended, I would love to understand why each folder is not precisely the pack size, or if I’m wasting space doing this way. I intuitively understand that less files are better, so I will leave 128MB I guess.
Or maybe use dd to test how long writing a 1GB files takes: dd if=/dev/zero bs=1M count=1000 of=testfile conv=fsync
. The important part is the fsync in the end. As otherwise data just ends up in the OS cache but not yet on disk.
I didn’t know this command! I will save it for sure.
The results are quite good: 60MB/s for source disk, 80MB/s for the second (is expected I guess, because source disk is 70% full, the other is basically empty).
So…benchmark time: let’s recap:
- ~2,5TB to backup on source disk. First 2 GBs of backup are appdata files, so a lot of text, logs, .sqlite and files. Then there are only bigger files (2-10GBs) until the end.
- Because file composition told above, I’ve stopped backup after 1h because I can’t off course afford to let it spend an entire day to finish to have the final performance idea. However, I saw that calculation final time after 30 minutes is different than waiting at least 1h, maybe because backup is slower with smaller files (the first 2 GBs, as I said) but than is faster with bigger and compact files. I could imagine that after 1h, the “backupping speed” would remain constant until the end
- if I’m using cache I will use ramdisk as cache (no-cache doesn’t use it, from what I can observe). I’m not totally sure what “cache” is for backrest, neither HOW important it is for restic backup/restore performance. That’s what I’m seeing from docker-compose: XDG_CACHE_HOME=/cache # path for the restic cache which greatly improves performance.
- I’ve also measured CPU consumption, at least roughly: I was curious to check how the backup jobs pegs the CPU in various scenarios. CPU is an Intel N100, by the way (I’m very happy with it). I can’t embed too much images or link in this post, I loaded them into Imgur, but I had to alter URL otherwise the forum didn’t me let post it. Please add “imgur.com” before every url
- Even when using 1G ramdisk for cache, actually it didn’t used much of it (directory where correctly configured, and restic actually created some files into it, even with no-cache usage). I’m not totally sure why. So I didn’t used RAM consumption
- After EACH try, I’ve cancelled everything: cancelled the repo, deleted the backup files. Just because I’m not sure how restic works when I backup again files when I’ve previously stopped a backup. So I’ve basically started from scratch each time.
- I will leave consideration after each steps. **I would love to see what you think about my consideration after each try
Let’s start
Try 0 - No flags - Basetest
cancelled in 1h0m5s
109.32 GB/2.41 TB
41623/74485
~22 hours and 4 minutes
/a/dZctx31
Results: sure ~22 hours is better than the ~50 I got initially! However, I’m using ramdisk as cache, so I’m not totally sure if it’s a smart idea or not for subsequential backups/restore/checks
Try 1 - Read concurrency to 1 (default is 2)
Flags:
–read-concurrency 1
By default, it uses sizepack of 16M, and it shows in backrest “data/” folder
0 ./da
0 ./db
0 ./dc
17M ./dd
0 ./de
17M ./df
0 ./e0
17M ./e1
17M ./e2
0 ./e3
0 ./e4
0 ./e5
0 ./e6
0 ./e7
0 ./e8
0 ./e9
17M ./ea
0 ./eb
17M ./ec
0 ./ed
0 ./ee
17M ./ef
0 ./f0
33M ./f1
17M ./f2
0 ./f3
0 ./f4
0 ./f5
17M ./f6
17M ./f7
cancelled in 1h0m7s
126.5 GB/2.41 TB
41639/74487
~19 hours e 7 minutes
/a/c1CjtXN
Results: I didn’t measured IO performance, but I could intuitively say that with SMR 2.5 inch disks reading one file at time could be faster. Seeing that, I guess I will use reading-concurrency 1.
Try 2 - As above plus no-cache
Flags:
–read-concurrency 1
–no-cache
cancelled in 1h0m39s
129.86 GB/2.41 TB
41647/74491
~18 hours and 46 minutes
/a/vn8ve3B
Results: using no-cache is slightly better, however I’m not sure what I’m loosing not using cache in my situation: would next backup/checks/prune be much slower?
Try 3 - As above plus max packsize (128MB)
Flags:
–read-concurrency 1
–no-cache
–pack-size 128
Here’s what I got into my data folder:
386M ./00
257M ./01
129M ./02
390M ./03
387M ./04
386M ./05
261M ./06
515M ./07
0 ./08
386M ./09
258M ./0a
647M ./0b
517M ./0c
258M ./0d
257M ./0e
259M ./0f
515M ./10
390M ./11
258M ./12
129M ./13
258M ./14
516M ./15
515M ./16
257M ./17
645M ./18
259M ./19
515M ./1a
259M ./1b
0 ./1c
129M ./1d
643M ./1e
387M ./1f
257M ./20
390M ./21
258M ./22
260M ./23
131M ./24
0 ./25
129M ./26
385M ./27
389M ./28
258M ./29
129M ./2a
262M ./2b
cancelled in 1h0m2s
199.28 GB/2.41 TB
41687/74487
~12 hours and 6 minutes
Results: big improvement this time! I would love to know if no-cache would make next backup slower, however: would be better to take the double of time but once, than take less time but everytime I start the backup. Unfortunately, because I still didn’t completed a full backup, I’m not sure how it behaves.
No CPU measure: I forgot to take it lol
Try 4 - Every suggested option, no compression
Flags:
–read-concurrency 1
–pack-size 128
–compression off
–no-cache
cancelled in 1h1m16s
141.38 GB/2.41 TB
41652/74488
~17 hours and 23 minutes
/a/H0le4QK
Results: I was curious to check the behaviour of CPU using “no-compression”: because most (90%) of the files can’t be actually compressed further (films, music) compressing it’s actually useless, from a size perspective, and it also could result if further wasted CPU works or maybe transferring time. But as said, it could have improved hard drive transfer times.
The 17 hours seems…too much, honestly, I suspect a faulty test here (in retrospective, maybe someone was using the server during the test?) but I don’t know WHY. Seems too…off.
What’s important is how the CPU actually worked: I can see a lot of “SOFT-IRQ”. Don’t know precisely what it is, and if it’s good or not.
What I’m sure that 17 hours is too much than the previous attempt, I should retry this but I’m too tired to do it.
Try 5 - Like above but use leave compression to “auto” (not using the command)
–read-concurrency 1
–pack-size 128
–no-cache
cancelled in 1h0m4s
197.34 GB/2.4 TB
41721/74515
Filesize on disk (when I’ve interrupted the backup job): 195 GB
~12 hours and 10 minutes
/a/etljULA
Results: Aside the weird CPU usage then the other tests (seems…less?) I took notice of the actual disk consumption: 197 GB of backupped source file, 195 file size on destination disk.
So basically no space saved, as I thought (you can’t shave further from .mp3 and H264/HEVC films). I think the only 2 saved GBs come from the first 2 GBs of “compress-friendly” files that I’ve explained in bullet point at the start.
So what I got? Just leave the compression to auto: let restic do it’s magic. I don’t know if check/restoring time would be different with compression, but I don’t think it will change a lot. Let leave it to “auto”
Try 6 - Like above but read concurrency 2
Flags:
–read-concurrency 2
–pack-size 128
–no-cache
cancelled in 1h0m15s
108.91 GB/2.4 TB
41621/74479
on Disk: 107 GB
~22 hours e 6 minutes
Results: I was curious, after everything, to restore the default read concurrency. Well, times speak for itself. At least in my situation, I think it’s the best.
Sooo, after a whole day of testing (well, I wan’t all the time watching the screen, so…) I still didn’t performed a full backup. However, I’ve shaved from initial 50 hours to…12 (try 5).
Also, I found that leaving compression off or auto doesn’t seems to change a lot in the worst case (albeit I’m sure my files can’t be truly compressed further) so I will leave to restic decide how it should behave (like every good software, it knows better than me).
Unfortunately, I’m still worried: what happens when I don’t have a “cache”? Because sure: 12 hours as a first backup isn’t too much (I can afford it, to having all these file snapshotted, restorable, with checksum so I can detect corruption…so everything I would want from a proper backup) but I’m not sure what would happen AFTER the initial backup: if I have no cache…how much time it would take to perform all these actions?: backup, full restore, prune, full repo check?
In the next days I will try to make a full backup so I can discover these times by myself, but would be HORRIBLE to take 12 hours each day/week/month to perform a backup. of these 2.5 (and further) TB/s
In the end I’m having fun knowing restic and backrest: seems a fantastic tool for backup, and I think I’m going to use it also for my Windows work PC.
It’s just that…I have to know better how it works under the hood, especially about how cache works.