Terrible check --read-data performance on vServer

Hi!

I recently got a vServer from Strato (c.f. V-Server mieten: mit Linux oder Windows | STRATO).
It is advertised with NVMe and the specific machine has 6 vcores, 8 GB RAM and 700 GB SSD storage.

I’m running Ubuntu 22.04 on the machine and use “restic 0.16.0 compiled with go1.20.6 on linux/amd64”.

My repository is 518 GB (roughly 32.000 files) and uses version 2, compression level auto.

When I do a “check --read-data” on my Synology DS220+, iotop reports about 160MB/s - which is ok for a RAID1 of WD RED 4 TB disks, I guess.
Both CPUs show about 70-90% usage during the operation.

Now on the vserver I only see 4-6 MB/s in iotop and restic’s progress (as shown by the numbers of checked files) is extremely slow.

htop shows there’s only minimal CPU usage.

A “hdparm -t” shows about 80 MB/s buffered read (my Synology has 200 MB/s) which is not too great.

I don’t quite know how to debug this, so any ideas are very welcome.
It might just be, that the virtual CPUs or the SSD totally suck.

Greetings

Nico

Where is the repository stored? Locally on the vServer or somewhere else?

That usually indicates that restic is waiting for IO. check on the vServer by default uses up to 6 threads to verify pack files and depending on the backend retrieves 2 to 5 pack files in parallel.

Hi!

I copied the complete repository from my local NAS to the vServer, so both tests were perfomed locally on the respective machine.

On the NAS I see 2-5 threads doing stuff, on the vServer there’s only 2 threads the whole time.

Top on the NAS shows:

NAS
top - 14:28:37 up 3 days, 3:01, 1 user, load average: 1.47, 0.66, 0.50 [IO: 0.38, 0.25, 0.19 CPU: 1.08, 0.40, 0.26]
Tasks: 286 total, 2 running, 284 sleeping, 0 stopped, 0 zombie
%Cpu(s): 67.0 us, 11.0 sy, 0.0 ni, 5.3 id, 16.0 wa, 0.0 hi, 0.7 si, 0.0 st
GiB Mem : 17.418 total, 0.173 free, 0.706 used, 16.539 buff/cache
GiB Swap: 12.451 total, 12.451 free, 0.001 used. 16.210 avail Mem

While on the vServer:

vServer
top - 14:43:29 up 22:44, 4 users, load average: 0.51, 0.45, 1.19
Tasks: 28 total, 1 running, 27 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.6 us, 0.2 sy, 0.0 ni, 99.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 8192.0 total, 0.0 free, 190.3 used, 8001.7 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 8001.6 avail Mem

So the NAS shows some I/O wait, whereas the vServer just says 0…

The vserver seems to be quite bored :zzz: , which suggests that checking a repository should work quickly. What happens when you run shasum -a256 path/to/repo/data/00/*? How large is that folder in the repository and how long does shasum take?

Are there any warnings reported by dmesg?

The data in the 00 folder is about 1.8 GB:

root@NAS:~# du -h /volume1/backup/nscheer/data/00
1.8G /volume1/backup/nscheer/data/00

There’s 109 files in there.

On the NAS:

time sha256sum /volume1/backup/nscheer/data/00/*
[…]
real 0m12.964s
user 0m2.888s
sys 0m0.627s

On the vServer:

root@h2942264:~# time sha256sum /backup/nscheer/data/00/*
[…]
real 1m43,628s
user 0m9,178s
sys 0m1,350s

Interestingly, it starts quite fast on the vServer. iotop reports about 80 MB/s read. But subsequently it slows down until it gets to a crawl of 2-4 MB/s.

If this was a write operation, I’d suspect a cheap SSD that can not keep up once the write cache is full.
But since these are reads only, I can only deduct that this is some kind of throttling going on there.

dmesg shows nothing. It is a “virtual” server after all, and a quite cheap one (9€ per month) - so I can imagine that they employ some form of throttling to not hurt other customers…

I uploaded the repo to another vServer provider in the meantime (2 vCores, 2GB RAM, 1 TB HDD storage, c.f. Alwyzon — Storage Servers in Vienna, Austria). On that server, even with worse specs I get a sustainable read rate of approx. 100 MB/s.

I guess I’ll just dump the other vServer altogether. I just want to be able to run a check once a week or so - that was the reasoning behind using a vServer and not just some cloud storage.
But it does not matter if it’s 80 MB/s or 200 MB/s - of course, 4 MB/s is inacceptable.

Thanks for your help!
If anybody has another hint, that would be much appreciated :slight_smile:

Is running shasum twice faster the second time (I’d expect that as the directory should be in the page cache afterwards).

Even then the 2-4MB/s don’t make sense. How are you supposed to ever read the 0.5 TB of data again? It might be worthwhile to ask the support what’s going on there.

Yes, it takes about 6-7 seconds then. That’s about the only thing that runs as expected on that machine :smiley:

True. I already filed a report. Might take some time to get to the last level support though.
Will report back :slight_smile:

1 Like