Backup/check/restore performance on USB3 HDD

Sina · September 21, 2019, 12:20pm

Situation here:

restic_0.9.5_windows_amd64.exe
repo in TrueCrypt-Container on external USB3-HDD
backing up from local HDD-partitions / SSD-partition
created initial snapshots of documents-areas (around 160 GiB, most files quite small except Thunderbird mailbox files) at about 30 MiB/s (added to the repo / time taken) - with source storage being factor 1.5 to 2 bigger than amount written
creating initial snapshots of further areas (around 1,600 GiB, most files rather big - VMs and media-files) ran at about 10 MiB/s - source amount / added ratio = 1.3
further initial snapshot with cache on local SSD showed slightly better speed (15 MB/s) - source amount / addad ratio = 2
“check --read-data” shows less than 1 MB/s read-speed according to Windows Ressourcemintor with very little CPU-consumption (execution cancelled)

My questions

What might be the cause of the decreasing backup speed? (repo growth?)
What might be the cause of the low check speed? Would restore speed be as slow?
Anyone experiencing the similar or better performance with external USB HDD?
What speed can be expected?

Speed of what kind of access is relevant for restic performance?
E. g. CrystalDiskMark 6.0.2 offers Sequential Read, Sequential Write, Random Read 4KiB and Random Write 4KiB …all with Q(ueues) and T(hreads) modifiable.
Which (if any) of those test profiles allow(s) concluding what approx. Restic performance to expect? What should I choose for Q and T to simulate Restic’s load best?

Thanks for reading until here.

Looking forward to your comments.

moritzdietz · September 21, 2019, 1:21pm

I bet you that it’s something with TrueCrypt and how data passes in and out of containers.
Have you tried copying the repository inside of the TrueCrypt container to a “normal” external hdd and re-run all your tasks that are taking long?

Also don’t forget, external hard drive speeds can vary greatly! Just because it says it is USB3 will give you guarantee that you get USB3 spec speeds. Have you verified that the speed bottle neck is not just a combination of a slow drive + slow performance due to TrueCrypt drivers being wonky and slowing things down? That’s where I would start.

Out of curiosity, would you care to give us some insight what made you put your restic repository inside of a TrueCrypt container?
As restic uses encryption for data in the repository itself. That double layer seems “overkill” IMO.

Sina · September 21, 2019, 2:06pm

Thanks for your reply.

I will do a test outside TrueCrypt as soon as I find time to do so. On the other hand I hoped to precheck this issue with less effort by benchmarking the container performance versus direct HDD-performance. Thats why I asked for the load profile of restic.

The performance of the drive might be the bottleneck in case some sort ort random 4k read and/or write speed plays an important role. Sequential read and write is close to 100 MB/s for bothe the container and the underlying HDD.

Backup into a TrueCrypt container gives me easy control over drive letters as well as protection for parallel (mirror) backups I additionally make of some rather static areas. In case I won’t manage to improve especially the check speed I tend to rethink which data might be better backed up with other means.

Sina · September 28, 2019, 7:17pm

To keep you posted: While copying the repo from the container (to prepare an off-container test), I also realized decreasing speed. I guess this is due to SMR on the target HDD. So this might also be a factor when taking restic snapshots with much data to backup…

Further comments still welcome.

moritzdietz · September 28, 2019, 9:18pm

Awesome! Thanks for keeping us posted.
Good you did some other tests to see where the issue is coming from.
It’s always a good step to decrease complexity in a setup and remove extra layers to dissect elements and their influence on other parts.

akrabu · September 29, 2019, 9:39am

What file system is it? FAT/exFAT can be pretty slow - especially with directories with a LOT of files in them, like Restic makes. Might try NTFS if it isn’t already. The underlying file system of the container may matter too.

Could mess with larger cluster sizes? Perhaps even mix-matched cluster sizes between the container and the underlying file system are slowing things down?

Might try resizing the drive’s main partition and creating an encrypted partition with TrueCrypt - that tends to be faster. Also AES is much faster on newer machines than most of the other algorithms available in TC.

Any reason you’re using TrueCrypt when Restic has (forced) built in encryption? Lots of overhead there.

Ps. VeraCrypt replaced TrueCrypt since it’s no longer being maintained, just on the off chance you hadn’t heard yet.

Sina · September 29, 2019, 4:46pm

Hi @akrabu

thanks for your hints.

NTFS both in container as well as underlying. Cluster size “default” - both filesystem > 2 TiB.

I don’t know. What do you call “larger”? Is there any cluster size for best practice?

Backup into a TrueCrypt container gives me easy control over drive letters as well as protection for parallel (mirror) backups I might additionally make (by simply copying) of some rather static areas. Furthermore it’s easy to mount a TrueCrypt container readonly when appropriate.

akrabu · September 29, 2019, 5:53pm

With both the container and underlying file system in NTFS at likely 4K cluster sizes, I’d leave that alone honestly.

And fair enough. Should Restic ever have an “unencrypted mode” you’d want to enable that I’d say - but it doesn’t, so I’m kind of out of ideas, minus a disk check.

Oh wait! Is this an SMR drive?? Shingled drive performance tends to drop very swiftly once the cache fills up. If you don’t know, you might want to google your drive’s model and “SMR”. A lot of manufacturers aren’t even clearly reporting whether or not drives are SMR anywhere on the packaging. Also you never quite get the same performance once you fill an SMR drive up to about 80%. SMR is kind of like an SSD controller for a regular HDD, which stores overlapping “shingles” of data due to the fact that the read head can read smaller areas than the write head is capable of producing using something akin to an SSD controller to translate - and from what I can see, only a few select brands like WD have TRIM support on some of the newest models. Older models just slow down after time, because of this.

Sooooo figure out if it’s SMR - if it is, that’s your explanation. Then see if TRIM is supported, and on the rare chance that it is (external support for even external SSD TRIM is dodgy, let alone for SMR), then enable it. You will STILL most definitely experience great reductions in speed (sometimes down to KB/s momentarily even) once that “normal write area” cache fills up and the controller has to start shingling the cache during the write (instead of during idle time).

If it is NOT SMR, then I am out of ideas haha

Ps. NEVER defrag an SMR drive! It’s worth finding out if yours is SMR just to avoid that. The controller handles that on its own, and hides the actual arrangement from the operating system just like flash. So defrag is a fool’s errand on SMR and will quickly wear out your drive. Same with using them in RAID arrays - although Synology is now saying it’s okay if they are ALL SMR, within a Synology box (which likely has firmware saying the extra delays aren’t bad spots, just SMR being SMR.)

Sina · September 29, 2019, 6:49pm

Yes, it is SMR (see reply #3) - looks obviously like most 2,5" (at least high capacity) hard disk drives nowadays

SMR shouldn’t affect check or restore performance, should it? (given no other operation is performed on that drive during check/restore)

With only little change in overall data to backup it might even be acceptable for taking subsequent snapshots, I hope…

akrabu · September 29, 2019, 7:12pm

Whoops, missed that. That’s what happens when I reply to posts at 3am lol. Yeah, I think PMR 2.5" drives top out at 2TB right now. I’ve got a 5TB Seagate 2.5" SMR drive and while I love it, restic is indeed very slow on it. It was way worse when I had it as exFAT, and is much more manageable now that it’s in APFS. APFS was designed for flash, and in my opinion compliments SMR very nicely. Others might warn you to be wary of using such new technologies together at such an early stage, but eh, it’s just a backup drive for me, and it really does increase performance. But that’s on a Mac…

And it depends on the check / restore performance! How long has the drive had to flush the cache? Are you restoring to the drive itself? Also, unfortunately, fragmentation is just going to happen on these drives. Lots of little files are going to be scattered all over the platter according to the controller’s whims. Defrags will NOT fix it, because the layout the software sees is not the TRUE layout on the drive, just what the controller wants the OS to see (controlling, isn’t it? lol). So the OS will get everything nice and “aligned” - and the controller has actually spread everything out even MORE and made things worse (hence why you don’t want to defrag).

Restic purges on SMR could, over time, be kind of costly in performance, I’d assume. No hard data on that, but it makes sense.

An explanation from Backblaze:

“This type of drive overlaps recording tracks to store data at a lower cost than PMR technology. The downside occurs when data is deleted and that space is reused. If existing data overlaps the space you want to reuse, this can mean delays in writing the new data. These drives are great for archive storage (write once, read many) use cases, but if your files turn over with some regularity, stick with PMR drives.” Source

For whatever it’s worth, you are certainly not alone. If you stick to mostly backups, and very little pruning, it’ll serve you well. I’d wait 'til it’s near 80% capacity and then do a very drastic prune, instead of pruning more frequently. That should cut down on some of the mess. But SMR drives are just slow. Upside? Lots of space. Think of it more like a WORM (write once, read many) drive, and you’ll be happier with your purchase. Or something like Amazon’s Glacier Deep Archive - SMR is best for “cold storage”. It’s the deletions and modifications that will be most costly (likely causing severe fragmentation, and impacting read performance as well).

TL;DR what seems to be rather simple operations, are in fact very complex operations on SMR drives:

Regular hard drive:

Wait for platter to rotate and seek head to first target sector in track
Write three sectors in direct succession

SMR hard drive:

Wait for platter to rotate and seek head to target track + 1
Read three sectors in direct succession, store in cache
Wait for platter to rotate and seek head to target track + 2
Read three sectors in direct succession, store in cache
Wait for platter to rotate and seek head to target track + n
Read three sectors in direct succession, store in cache
(Repeat until we hit end of medium* or band)
Seek head to target track
Write original three sectors
Wait for platter to rotate and seek head to target track + 1
Rewrite three previously stored sectors, recalled from cache
Wait for platter to rotate and seek head to target track + 2
Rewrite three previously stored sectors, recalled from cache
Wait for platter to rotate and seek head to target track + n
Rewrite three previously stored sectors, recalled from cache
(Repeat until we hit end of medium* or band)

Source

Sina · September 29, 2019, 7:39pm

Thanks for all the explanations. Until some research on the web yesterday, I wasn’t aware of the existence of SMR. I wonder why discussions about SMR are rather rare and hard to find. So I’m very happy about your contributions to this thread.

Do you have any information about the impact of sudden power loss during write operation (or cache/“rapid disk area” to shingled storage transfer when “idle”)? Without compensating techniques I suspect impact on data integrity is likely to be significantly higher than with CMR/PMR…

akrabu · September 29, 2019, 7:54pm

Oh, if you want discussions on SMR, go check out r/DataHoarder on Reddit lol. That’s where I learned the most about it.

From what I hear, there is a journal of sorts on the controller. It knows what it has moved from the cache and what it hasn’t. I have tested by writing a large file (100GB, plenty enough to fill the cache) with PAR2 recovery files, then when the copy operation completed and I could still hear the heads moving about frantically even after unmounting it, I yanked the power out (technically a safe removal on any other drive, but I could HEAR it doing stuff). When I plugged it back in, I could hear it essentially resume the activity. I let it finish, checked the integrity with the PAR2 files, and everything was just fine. I have NOT tested an unsafe removal - but I‘d wager it’s no more destructive than doing the same to a PMR drive, after my testing.

I mostly use mine for temporary storage of stuff I get off Usenet. When it fills up, I dump the whole thing on my Drobo and start fresh. I don’t download TO the SMR drive, I download on my local SSD and have a rule set to move it in one contiguous chunk when complete. It excels at this purpose and I’m quite happy with it. The writes aren’t often sustained long enough to ever hit the cache ceiling, and I enjoy maximum space and maximum performance.

I used to use it for Restic backups at work, but I no longer work there and didn’t need it for that purpose anymore. It was definitely painstakingly slow at times on Windows with exFAT for that purpose. For personal backups, I use Restic + Backblaze B2.

Sina · September 29, 2019, 10:29pm

You mean with most SMR drives there is NO way to regain initial performance? (except with TRIM if available) Not by reformatting? Not by repartitioning or using some SMART or vendor specific function. I wonder if even initial formatting (with full check, not using “fast” format) does harm to the perrormance right from the start…

What do you do to “start fresh”?

How did you implement that “rule set”?

akrabu · September 29, 2019, 11:18pm

Without TRIM, no, not really. The controller doesn’t know what is free and what isn’t (it’s not file system aware). And yes, a full format on a drive not supporting TRIM could certainly cause performance issues, I bet.

Luckily newer WD drives support TRIM on Windows (and likely over Thunderbolt on Macs): TRIM Support for HDD on Windows and macOS

Me, personally… I don’t fret about it too much. macOS doesn’t support TRIM over USB, though at some point I want to try to run TRIM on some Seagate Archive 3.5" drives I have over Thunderbolt, cause I hear that works. But those are truly cold storage disks for me, and I only back up to them about once a month or two.

What do I do? I just do a move (with verify) option using RapidCopy (FastCopy on Windows) to my Drobo, then continue to use it. It might take awhile, but I usually do it at night when I’m asleep anyway. I don’t care if it’s slower than when I first got it - it’s the sheer amount of storage in such a compact space that I was after. I was getting about 120-130MB/s sustained writes initially. Now? About 90MB/s. Eh, whatever - at least it’s not USB2 speeds. I have my Drobo with a bunch of PMR WD Reds that suit my needs just fine. And a 2TB SanDisk USB-C flash drive I mostly use for backing clients up with (I’m a sysadmin, and do general tech on the side for clients).

Definitely not what it used to be. I wish I had used this app for a speed test earlier for comparison, but anyway you get the idea. It’s slower. But most of the time I’m not even here when NZBVortex is moving the finished downloads over (that’s how I implemented that “rule set” - it’s just an option to move finished downloads elsewhere).

If I ever repurpose it for something with more of a need for speed, I’ll likely shuck it, put it in my Thunderbolt drive dock, run TRIM, and have at it.

Also considering one of these: Brand New Synology DS619slim 6-Bay NAS for 2.5″ SSD Media for 2018 – NAS Compares

I could buy a bunch of those $99 5TB SMR drives, shuck them, and put them in that for a total of 25TB in a relatively tiny package. Plus Synology says SMR is supported so long as they’re ALL SMR drives. Fun stuff. Wonder if the NAS box will run TRIM on them? I’ll have to look into that…

Ps. Notice my sequential read speeds are actually slower than the write speeds. And my random read is a whopping 10MB/s It was probably moving cache around. That’s just SMR drives for you.

Sina · September 29, 2019, 11:21pm

Some further tests result here:

Above:

Now:
After copying the repo to an exFAT formatted (as delivered from manufacturer) SMR-Drive:
"restic check --read-data” shows around 40 MB/s read-speed according to Windows Ressourcemintor

further tests in TC-container on SMR-drive (Repo generated here by creating snapshots with restic):

"restic check --read-data” still less than 1 MB/s (interrupted)
calculating sha256-sums of packs with HashCheck Shell Extension shows >50 MB/s

Any ideas?

akrabu · September 29, 2019, 11:24pm

Did that involve a TrueCrypt container?

Ps. I don’t know about yours, but mine came as exFAT with an impossibly small (for exFAT) 256k cluster size (you can’t select that with the normal Windows formatter). You probably have like 512k clusters now. Technically, on sustained writes, that may be faster. You’ll just waste a fair amount of space on smaller files.

If you’re still using those TC containers, I’d say the reason it’s faster is the larger cluster size plus not having to journal anymore.

akrabu · September 29, 2019, 11:48pm

I forgot to mention - those 8TB SMR monster drives I use for cold storage? Yeah, I don’t use Restic on those. I literally make a giant tarball so it’s one long sustained write the entire time. It’s my next-to-worst-case backup (worst case is B2 and even worse would be S3 Glacier Deep Archive - I only keep pictures in the latter). It takes forever but I view it as my “poor man’s tape backup”. I had a DAT USB tape drive at one point but it finally bit the dust and no one else has offered me their hand-me-downs as of late.

Sina · October 24, 2019, 12:54pm

Here are the results of some more thorough testing:

command:
restic.lnk check --read-data-subset 1/3 --cache-dir -r

a) With Repo in NTFS-formatted TrueCrypt-Container on NTFS-formatted SMR-Drive
group #1 of 108125 data packs (out of total 322446 packs in 3 groups)
[4:33:41] 100.00% 108125 / 108125 items
duration: 4:33:41

b) With Repo on same model SMR-Drive, exFAT-formatted (BS 128k) as delivered - no container
read group #1 of 108125 data packs (out of total 322446 packs in 3 groups)
[3:46:49] 100.00% 108125 / 108125 items
duration: 3:46:49

Do you expect any backdraws when reformatting HDD b) to NTFS (in “quick” mode)?
Since some research I consider NTFS less prone to FS-corruption due to journaling.