Speed issues over USB3?

Hey guys,

Had pretty excellent performance with internal SSD -> 5400RPM spinning rust living on a Mini-ITX Atom box over Wi-Fi. However, I get much worse performance on a USB3 -> USB3 HDD (both of which test Read + Write in excess of 130MB/s (on the same machine that backs up well from SSD -> 5400RPM over rest-server), so I presume it’s not an issue with the CPU etc.

Basically, backed up a 95% of a 750GB archive of one USB3 drive to another USB3 drive yesterday. Went to restart it today, and 1 hour 20 minutes later it’s only made it through 226GB of the same data it’s already backed up. Watching in dstat, it’s been reading the whole time (at speed of about 120MB/s), but writing nothing. This is suggesting to me that it’s re-reading all the files, in their entirety, but still deciding that they’ve been backed up.

Is this the expected behaviour of the archiver, or is something potentially wrong here? I would have thought part of the cache’s behaviour was to prevent it re-reading all files in their entirety, but, am I mistaken? Thanks!

EDIT

Further info:

  • Repository created using 0.9.0, and only ever used this version
  • Repo mount points don’t change (i.e. always the same paths)
  • Repo cache folder top level has modified date of when I started latest backup, but no content within has been modified today (presumably because nothing’s been written yet)

Did you just upgrade to version 0.9.0?

I haven’t just upgraded to it, in the sense that this repo has always used version 0.9.0. But yes, I recently started using 0.9.0 on the whole (as in, on all my systems).

To clarify, this repo has only ever been 0.9.0.

Hmm… I’m not sure off-hand then. I’ll let someone more experienced take this one. :slight_smile: Sorry!

No probs - thanks for chiming in nonetheless! :slight_smile:

I can explain what’s happening here. The backup process is roughly as follows:

  • Read the so-called “index” files from the repo to find out which data is already stored
  • Read file, split it into smaller blobs
  • For each blob, check if it has already been saved
  • If not, save it to the repo
  • Regularly write new “index” files to the repo, which contains the list of all newly added blobs

That’s what’s happening here: restic reads files, sees that all blobs have already been saved, and moves on to the next file.

Once the backup completes and you have a snapshot in the repo, the process is a bit different:

  • Read the so-called “index” files from the repo to find out which data is already stored
  • Load the information from the previous snapshot
  • For each file:
    • Check if it has been modified or is new, then read file, split it into smaller blobs
    • If the file hasn’t been modified, take the list of blobs from the previous snapshot and move on (without reading the file again)
    • For each blob, check if it has already been saved
    • If not, save it to the repo
  • Regularly write new “index” files to the repo, which contains the list of all newly added blobs

As you can see, without a previous snapshot restic doesn’t know about which files are there, only which blobs have been saved, so it needs to read all data again. That’s a limitation we have right now, it nicely matches the behavior you observe.

2 Likes

Ah right - that definitely explains it :slight_smile: Thanks so much for clarifying - I was dreading that I might not be able to use restic for this repo size (if it was going to take 3-5 hours for each snapshot). Guys I shouldn’t be so hasty to cancel when I have to leave work next time :stuck_out_tongue:

Thanks for taking the time to explain in such detail!

I guess this means that restic only checks size and mod-time to detect file changes then?
Is there a way to force check the content / checksum?

I don’t have a use case for this right now, but I’m thinking of some encrypted containers that don’t change in size or mtime.

Sure, just use the option --force, the restic re-reads everything

1 Like

Just confirming @fd0 that everything is good now :smiley: Took 4.5 hours on the 95% complete dataset, but now, it’s a little over 2 minutes on a 100% complete dataset:

repository 210fc370 opened successfully, password is correct
found 5 old cache directories in /home/jenga/.cache/restic, pass --cleanup-cache to remove them

Files:           0 new,     0 changed, 273079 unmodified
Dirs:            0 new,     1 changed,     0 unmodified
Added:      367 B

processed 273079 files, 781.378 GiB in 2:36
snapshot 8ac5cdb9 saved 

Thanks :smiley:

1 Like