First backup, 2.5TB, ~50 hours. Can I improve it?

MisterX · August 5, 2024, 9:03pm

Hi there!

I’m using Restic for the first time (through Backrest, but I guess it’s not the main topic here) to backup data on my linux headless mini-nas (is actually a Beelink S12 Pro mini-pc)

I want to backup the whole data HD, with almost all big files (film, music, general data)

2.35 TB, 74229 total files (basically half of them are big files).

My main disc is a 2.5 Hard Drive connected through USB 3.0, that is capable to read at 80/100MB/s.

The destination disc is a 2.5 Hard Drive connected through USB 3.0. The writing speed is…good enough, I guess.

I’ve also set an SSD as cache disk for restic/backrest, and I thought that would improve speed.

However, as first backup, from what I can calculate leaving the backup running, it should take ~50hours for a full 2.5TB backup.

That’s almost…15MB/s, doing some simple calculations.

Seems too low, actually (I can be patient for 3 days, it’s just that I would like to do if is possible to improve the situation).

I don’t think there’s a big I/O bottleneck (albeit I’m not sure how to measure it overall: reading from a disk and writing on the other seems easy to measure, and I’ve anedoctally measure it just reading/writing files).

I didn’t set ANY extra flag/option on backup job. I saw THIS article on restic wiki, but none of them I guess would be very beneficial (maybe the “pack size” would be good, albeit I’m not sure what would be a good value for it).

Most of the files are 2/10GB film, there are smaller 1-2MB for songs collection.

I know that restic dedupe every files, and also compression/encryption is on. Seems that the latter can’t be disabled, am I right? The deduping operation I guess is used to speed up further backups, so I guess there’s anything else I could do.

Here’s the restic command I can desume from backrest logs:

2024-08-05T23:00:39.038+0200 ERROR task failed {“task”: “backup for plan "plan-backup-Disk"”, “error”: “failed to backup: command "/bin/restic-0.16.4 backup --json /Disk -o sftp.args=-oBatchMode=yes --exclude-caches --tag plan:plan…" failed: exit code -1: backup failed\nprocessing command output: no summary event found”, “duration”: “1h11m24.512072225s”}

(I’ve cancelled task after 1h and 11m, where it backupped only 44.96 GB out of 2.35 TB)

Restic version:
command: /bin/restic-0.16.4 version -o sftp.args=-oBatchMode=yes
restic 0.16.4 compiled with go1.21.6 on linux/amd64
took 74.388628ms

Anyone has suggestion on how I could speed up the whole things and/or better diagnosticate what would be the bottleneck, if any?

kapitainsky · August 6, 2024, 5:11am

Make sure you are using the latest restic version - you are not.
Your destination disk is most likely SMR - which are VERY slow for random small writes.
Given point 2 - increase pack size - for your data (mix of media files) I would go with 128 MB or even 256 MB. Read about caveats of large packs and make sure your system can handle it.
You are backing up from and to mechanical disks - not good performance for parallel operations - limit it. Try with 2 or 1. The same with reading concurrency.
Looks like you are using SFTP - it is notoriously slow protocol. I would consider running restic server on your destination machine and using REST API.
You might consider turning compression off - but I would not unless you run your backups from very slow computer (raspberry pi or similar)

MisterX · August 6, 2024, 7:02pm

Thanks for the answer!

I try to answer point by point

Actually, I can’t: I’m using Backrest and it’s updating on it’s pace. I could surely try the lastest version somehow, but I got your point (seems that Restic has improved a lot in the last versions)
I didn’t know about SMR. Being a 5TB Drive, is almost surely SMR. I guess I can’t do anything about that
I’ve tried 256MB but seems that I can’t: Restic gives me error if I try more than 128MB (that seems the max, at least in the version I’m using)
I’ve tried to set concurrency to max 1. I’m not sure on how to limit backend conditions
This seems weird: I’m trasferring things locally (from a local HD to another one) so for sure I’m not using SFTP. This is a Backrest limitation, I suppose
I’ve done it regardless (I don’t think film and music can be further compressed)

What I’ve tried? I’ve set the environment variables (using Backrest through Docker, so it’s quite easy):

  - RESTIC_READ_CONCURRENCY=1
  - RESTIC_PACK_SIZE=128
  - RESTIC_COMPRESSION=off

Incredibly, seems that the overall time is about the same, sadly. They works because now I see only one file transferred at time.

Would be good to try latest Restic version, just the see if something change.

I’ve also tried to move Restic cache from disk (SSD, actually) to RAM disk.

Results? The same.

Quite amazed that I can’t do ANYTHING for speeding up things

kapitainsky · August 6, 2024, 7:33pm

From Backrest github:

Backrest installs a specific restic version to ensure that the restic dependency matches backrest. This provides the best guarantees for stability. If you wish to use a different version of restic OR if you would prefer to install restic manually you may do so by setting the BACKREST_RESTIC_COMMAND environment variable to the path of the restic binary you wish to use.

So you can try the latest restic.

MichaelEischer · August 9, 2024, 8:34pm

How much CPU does restic use? At 15MB/s transfer speeds disabling compression doesn’t make much of a difference. In fact, it could actually slow things down if writing to the destination disk is the bottleneck.

When backing up to a local disk, just use the maximum pack size most potential downsides aren’t relevant in your setup.

As both disks are local, you can also disable the cache using --no-cache. The cache is most important for remote backends.

The read concurrency defaults to 2 which appeared to be the sweet spot for HDDs during experiments several years ago. You could try to reduce the concurrency of writing data to the destination disk using -o local.connections=1.

Except for listing files in a repository, that’s no longer true with restic 0.17.0. sftp allows for sufficient pipelining to easily saturate most network connections.

Please check whether the environment variables were applied successfully. The easiest is probably to look for files with size of about 128MB in the data folder of the repository.

If the options were applied successfully, then we’re apparently not addressing the bottleneck. You could use iotop, iostat or similar to check how busy your disks are.

Or maybe use dd to test how long writing a 1GB files takes: dd if=/dev/zero bs=1M count=1000 of=testfile conv=fsync. The important part is the fsync in the end. As otherwise data just ends up in the OS cache but not yet on disk.

MisterX · August 14, 2024, 9:16am

Finally I’m on vacation, so what’s better than wasting days into backupping my NAS?

The title had “~50 hours” but I’ve discovered an error in my configuration: I’ve setup wrong the restic cache location: I meant to use the same Disk A, but I actually setup an internal SSD, thus I wasn’t actually backupping “Disk A → Disk B” but I was doing “Disk A → (Read cache from Disk C - SSD) → Disk B”

That was KILLING backupping performance. After I moved the restic cache into a 1G ramdisk (or use no-cache as @Mic told me) I got TOTALLY different performance.

I’ve took my time to do performance benchmarks, that I can report here just for curiosity so maybe can be useful to someone else.

The read concurrency defaults to 2 which appeared to be the sweet spot for HDDs during experiments several years ago. You could try to reduce the concurrency of writing data to the destination disk using -o local.connections=1.

I’m not sure how I could set -o on backrest, so I leaved this out from my tests, but I benchmarked read concurrency 2 with 1, as you can see below.

As both disks are local, you can also disable the cache using --no-cache. The cache is most important for remote backends.

I would TRULY know more about this, because I saw from other post that cache can also grow a lot (several dozen of GBs) so I’m curious on what happen if I don’t create it, especially if I’m backupping several TBs: shoudn’t consequential backups take more/a lot more? (I don’t want to find this by myself weeks from now)

Please check whether the environment variables were applied successfully. The easiest is probably to look for files with size of about 128MB in the data folder of the repository.

Also because this, I moved from using ENV VAR into simply using job flags (is easier to benchmark then this way). Also seems that pack size is applied as I thought: if I don’t set anything, most folders inside “data” are 17MBs/50MBs (seems that actually packsize isn’t the upperbound of how heavy can be each folder). If I set 128MB (the max that restic allow me, at least the version I’m using) each folder is 128MB and can reach up to 500MB.

I guess it’s intended, I would love to understand why each folder is not precisely the pack size, or if I’m wasting space doing this way. I intuitively understand that less files are better, so I will leave 128MB I guess.

Or maybe use dd to test how long writing a 1GB files takes: dd if=/dev/zero bs=1M count=1000 of=testfile conv=fsync. The important part is the fsync in the end. As otherwise data just ends up in the OS cache but not yet on disk.

I didn’t know this command! I will save it for sure.

The results are quite good: 60MB/s for source disk, 80MB/s for the second (is expected I guess, because source disk is 70% full, the other is basically empty).

So…benchmark time: let’s recap:

~2,5TB to backup on source disk. First 2 GBs of backup are appdata files, so a lot of text, logs, .sqlite and files. Then there are only bigger files (2-10GBs) until the end.
Because file composition told above, I’ve stopped backup after 1h because I can’t off course afford to let it spend an entire day to finish to have the final performance idea. However, I saw that calculation final time after 30 minutes is different than waiting at least 1h, maybe because backup is slower with smaller files (the first 2 GBs, as I said) but than is faster with bigger and compact files. I could imagine that after 1h, the “backupping speed” would remain constant until the end
if I’m using cache I will use ramdisk as cache (no-cache doesn’t use it, from what I can observe). I’m not totally sure what “cache” is for backrest, neither HOW important it is for restic backup/restore performance. That’s what I’m seeing from docker-compose: XDG_CACHE_HOME=/cache # path for the restic cache which greatly improves performance.
I’ve also measured CPU consumption, at least roughly: I was curious to check how the backup jobs pegs the CPU in various scenarios. CPU is an Intel N100, by the way (I’m very happy with it). I can’t embed too much images or link in this post, I loaded them into Imgur, but I had to alter URL otherwise the forum didn’t me let post it. Please add “imgur.com” before every url
Even when using 1G ramdisk for cache, actually it didn’t used much of it (directory where correctly configured, and restic actually created some files into it, even with no-cache usage). I’m not totally sure why. So I didn’t used RAM consumption
After EACH try, I’ve cancelled everything: cancelled the repo, deleted the backup files. Just because I’m not sure how restic works when I backup again files when I’ve previously stopped a backup. So I’ve basically started from scratch each time.
I will leave consideration after each steps. **I would love to see what you think about my consideration after each try

Let’s start

Try 0 - No flags - Basetest

cancelled in 1h0m5s
109.32 GB/2.41 TB
41623/74485

~22 hours and 4 minutes

/a/dZctx31

Results: sure ~22 hours is better than the ~50 I got initially! However, I’m using ramdisk as cache, so I’m not totally sure if it’s a smart idea or not for subsequential backups/restore/checks

Try 1 - Read concurrency to 1 (default is 2)

Flags:
–read-concurrency 1

By default, it uses sizepack of 16M, and it shows in backrest “data/” folder

0 ./da
0 ./db
0 ./dc
17M ./dd
0 ./de
17M ./df
0 ./e0
17M ./e1
17M ./e2
0 ./e3
0 ./e4
0 ./e5
0 ./e6
0 ./e7
0 ./e8
0 ./e9
17M ./ea
0 ./eb
17M ./ec
0 ./ed
0 ./ee
17M ./ef
0 ./f0
33M ./f1
17M ./f2
0 ./f3
0 ./f4
0 ./f5
17M ./f6
17M ./f7

cancelled in 1h0m7s

126.5 GB/2.41 TB
41639/74487

~19 hours e 7 minutes

/a/c1CjtXN

Results: I didn’t measured IO performance, but I could intuitively say that with SMR 2.5 inch disks reading one file at time could be faster. Seeing that, I guess I will use reading-concurrency 1.

Try 2 - As above plus no-cache

Flags:
–read-concurrency 1
–no-cache

cancelled in 1h0m39s
129.86 GB/2.41 TB
41647/74491

~18 hours and 46 minutes

/a/vn8ve3B

Results: using no-cache is slightly better, however I’m not sure what I’m loosing not using cache in my situation: would next backup/checks/prune be much slower?

Try 3 - As above plus max packsize (128MB)

Flags:
–read-concurrency 1
–no-cache
–pack-size 128

Here’s what I got into my data folder:

386M ./00
257M ./01
129M ./02
390M ./03
387M ./04
386M ./05
261M ./06
515M ./07
0 ./08
386M ./09
258M ./0a
647M ./0b
517M ./0c
258M ./0d
257M ./0e
259M ./0f
515M ./10
390M ./11
258M ./12
129M ./13
258M ./14
516M ./15
515M ./16
257M ./17
645M ./18
259M ./19
515M ./1a
259M ./1b
0 ./1c
129M ./1d
643M ./1e
387M ./1f
257M ./20
390M ./21
258M ./22
260M ./23
131M ./24
0 ./25
129M ./26
385M ./27
389M ./28
258M ./29
129M ./2a
262M ./2b

cancelled in 1h0m2s
199.28 GB/2.41 TB
41687/74487

~12 hours and 6 minutes

Results: big improvement this time! I would love to know if no-cache would make next backup slower, however: would be better to take the double of time but once, than take less time but everytime I start the backup. Unfortunately, because I still didn’t completed a full backup, I’m not sure how it behaves.

No CPU measure: I forgot to take it lol

Try 4 - Every suggested option, no compression

Flags:

–read-concurrency 1
–pack-size 128
–compression off
–no-cache

cancelled in 1h1m16s
141.38 GB/2.41 TB
41652/74488

~17 hours and 23 minutes

/a/H0le4QK

Results: I was curious to check the behaviour of CPU using “no-compression”: because most (90%) of the files can’t be actually compressed further (films, music) compressing it’s actually useless, from a size perspective, and it also could result if further wasted CPU works or maybe transferring time. But as said, it could have improved hard drive transfer times.

The 17 hours seems…too much, honestly, I suspect a faulty test here (in retrospective, maybe someone was using the server during the test?) but I don’t know WHY. Seems too…off.

What’s important is how the CPU actually worked: I can see a lot of “SOFT-IRQ”. Don’t know precisely what it is, and if it’s good or not.

What I’m sure that 17 hours is too much than the previous attempt, I should retry this but I’m too tired to do it.

Try 5 - Like above but use leave compression to “auto” (not using the command)

–read-concurrency 1
–pack-size 128
–no-cache

cancelled in 1h0m4s
197.34 GB/2.4 TB
41721/74515

Filesize on disk (when I’ve interrupted the backup job): 195 GB

~12 hours and 10 minutes

/a/etljULA

Results: Aside the weird CPU usage then the other tests (seems…less?) I took notice of the actual disk consumption: 197 GB of backupped source file, 195 file size on destination disk.

So basically no space saved, as I thought (you can’t shave further from .mp3 and H264/HEVC films). I think the only 2 saved GBs come from the first 2 GBs of “compress-friendly” files that I’ve explained in bullet point at the start.

So what I got? Just leave the compression to auto: let restic do it’s magic. I don’t know if check/restoring time would be different with compression, but I don’t think it will change a lot. Let leave it to “auto”

Try 6 - Like above but read concurrency 2

Flags:
–read-concurrency 2
–pack-size 128
–no-cache

cancelled in 1h0m15s
108.91 GB/2.4 TB
41621/74479

on Disk: 107 GB

~22 hours e 6 minutes

Results: I was curious, after everything, to restore the default read concurrency. Well, times speak for itself. At least in my situation, I think it’s the best.

Sooo, after a whole day of testing (well, I wan’t all the time watching the screen, so…) I still didn’t performed a full backup. However, I’ve shaved from initial 50 hours to…12 (try 5).

Also, I found that leaving compression off or auto doesn’t seems to change a lot in the worst case (albeit I’m sure my files can’t be truly compressed further) so I will leave to restic decide how it should behave (like every good software, it knows better than me).

Unfortunately, I’m still worried: what happens when I don’t have a “cache”? Because sure: 12 hours as a first backup isn’t too much (I can afford it, to having all these file snapshotted, restorable, with checksum so I can detect corruption…so everything I would want from a proper backup) but I’m not sure what would happen AFTER the initial backup: if I have no cache…how much time it would take to perform all these actions?: backup, full restore, prune, full repo check?

In the next days I will try to make a full backup so I can discover these times by myself, but would be HORRIBLE to take 12 hours each day/week/month to perform a backup. of these 2.5 (and further) TB/s

In the end I’m having fun knowing restic and backrest: seems a fantastic tool for backup, and I think I’m going to use it also for my Windows work PC.

It’s just that…I have to know better how it works under the hood, especially about how cache works.

MichaelEischer · August 14, 2024, 8:39pm

For a locally stored repository using or not using the cache usually doesn’t make a difference. Your HDDs seem to be rather sensitive to concurrent access, so it might make a slight difference.

However, once the initial backup is done, restic can skip unmodified files. So latter backups should be much faster. If there’s only little change then a subsequent backup probably only takes a few minutes.

MisterX · August 18, 2024, 9:33am

Ok, I’ve finally performed my first backup.

In the meanwhile, Backrest has updated to restic 0.17, so it could also have improved above situation.

I’ve gone with the

–read-concurrency 1
–no-cache
–pack-size 128

flags, as resulted above by my tests.

It finished in 14h3m26s. Honestly? It’s very good.

Every subsequent backup takes few minutes as you told. It only transfer changed files, if any (usually appdata changes albeit a little, so there’s always something to transfer.

I’m working in excluding logs and caches locations, so actual useful files to transfer will be even less and less.

Seems like truly black magic to me, especially without any kind of cache.

Pruning takes few seconds, so that’s another fantastic things.

I’ve also performed a 100% check operation, that it took ~ 6h and 20 minutes. It’s ok, to CRC check 2,5TB on not high performance hard drive, I guess. I’ve scheduled it to run once a month, to be sure that destination disk and backupped files are perfect and not corrupted.

Honestly? I couldn’t ask anything better.

Restic is black magic, I’m very happy that I’ve spend so much time messing with it, thanks also to your help.

I’m trying to make it the backend for my main work Windows PC and in general for every file backup needance: what anyone could ask more?

Thanks for your help, for sure I will be a Restic advocacy from now on

AnAnalogGuy · May 9, 2025, 12:57pm

I have a very similar situation, old rusty HDD connected via USB3, using restic to transfer to a Hetzner storage box via sftp. 3 TB of media files between 1GB and 8 GB each. Effective transfer rate: 15 MB/s, ETA > 300h.

Interesting thing is, none of CPU, bandwidth, USB/HD I/O, RAM is near its capacity. I tested the backend and it can write much faster than 15 MB/s. I tested the spining rust and the hdd can deliver up to 100 MB/s random read without dropping below 50 MB/s at any time. However, sometimes is see iowait, which would indicate the CPU aka restic ist waiting for data from the USB drive.

This makes me think it’s restic itself being the bottleneck.

My intermediate conclusions are:

Media files generally can not be compressed very well as they normally already are compressed to a good degree → no reducation in size
deduplication within a (the first) snapshot most likely does not happen a lot as there aren’t too many identical patterns/chunks to be found → no reduction in size
because of the nature of deduplication, restic propably can’t be parallized too much, else than splitting the files in two or more seperate repositories and processing them in parallel in seperate restic instances → at the cost of losing dedup between the repos in case there would be dedup
As large media files seldom change other that they are deleted or added, but can’t be dedupped well, there seems to be little benefit from using restic in such a scenario. Tools like rsync might do the job far more efficient in terms of processing time by just comparing file lists local and remote and only handle the files that have changed.

So: restic isn’t the right solution for this kind of data/backup needs.

Am I wrong?

nicnab · May 9, 2025, 1:53pm

May I suggest opening a new thread and just link to this old one here? While you’re at it: what’s your exact restic command (read concurrency?) and what’s the bandwidth of your connection to Hetzner?

AnAnalogGuy · May 11, 2025, 12:21pm

Interestingly, when on the inital run I split the restic job into several jobs only containing one sub dir a time, performance is as to be excpected and normal compared to other restic jobs I have working. In this case the disk read rate is at 100 MBit/s which is approx. max. net rate achivable from the related disk and would be the expected read rate as with large files a sequential read is very likely, so there won’t be too much head movements as it would be the case with random read for a lot of smaller files.

This makes me think restic has an issue when total source size exeeds a certain amount. In my case, it seems as if this breaks at a specific point. I.e a 1.75 TB sub dir it works fine, with the total appox. 2.75TB it runs endlessly.

Now, once i run all the sub dirs < 1.75TB in a single run and after this i run the root dir, also the root dir run works. It still takes long (8 hours) but that would be approx. the time expected with the given bandwith and read speeds. Still i don’t get why it take that long only from running the root dir when all files in all the sub dirs of that root dir have just been processed.

Interestingly, doing the same on Windows instead of Debian, there is no issue (same network, very similar hardware).

My conclusion: There is some issue in restic under the given conditions.

AnAnalogGuy · May 24, 2025, 7:58am

I did another test, different machine, copy of the same data but different drive. Exactly the same behaviour.

Bottomline of my experience with restic so far: As much as i like how it works for smaller backups of smaller files, its performance on large backup with large files is so low that it’s not usable in practise.

I gave up and switched to an encrypted remote storage using rsync. Job is done in a couple of minutes. If there isn’t much potential for dedup and a large storage size, restic simple is the wrong tool, as from its working principle, it doesn’t seem to perform well enough in such scenarios.

MichaelEischer · May 25, 2025, 4:19pm

As you haven’t told us the restic version you’re using, I can only assume that it is still affected by the SFTP upload performance bug fixed in restic 0.17.0.

restic parallelizes very well, but that’s useless if the upload is the bottleneck.

The size is not the problem. There are lots of people on the forum that back up dozens of TB to their repositories. The problem is likely related to something else.

The parent detection for a snapshot currently expects the snapshot paths to match exactly. You can use --parent to explicitly specify the snapshot of the subdirectory as starting point.