`restic restore` speeds asymptotically dropping towards zero!

durval · May 20, 2019, 12:57am

Hello everyone,

Sorry for the exclamation points, but this is pretty much an emergency.

Since yesterday, I’ve been running a restic restore to read back a relatively small (220K files / 72GiB) subdirtree of my 33M files / 23TiB backup, and things started pretty well, but then it started going literally downhill, please see picture:

RIght now, it’s progressing at the amazingly slow rate of 43KB/s – yes, K as in Kilo: less than 0.09% of my available bandwidth, and more importantly, around 1/1000 of the speed when it started.

What’s worse, a dtruss on the restic process shows it doing a WHOLE LOT of nothing at the system call level: just lots and lots of semwait*() and psynch*() syscalls, see here, which I understand are thread-synchronization fluff; even worser, if I filter those calls out, I can only see a lot of madivise() syscalls (see here) which I understand are Golang’s GC telling the kernel which areas can be considered free.

But, it has not completelly stopped, at least not yet: if I filter specifically for read() calls, I see a precious few of them: only 3 calls returning exactly 4K each (ie 12K bytes) in ~10s, which jibes with what I’m seeing (with df -k) being added to the directory I’m restoring to.

For reference, this is standard restic 0.9.4 compiled with go1.11.4 on darwin/amd64, running on MacOS/X High Sierra in a MacPro Late-2013 Xeon quad-core with 64GB RAM entirely dedicated to running restic (I’m not even logged on its console, everything is being done over SSH). I checked all system logs and see nothing strange. The exact command used for the restore is:

restic --verbose --target=. --include="/LARGEBACKUPROOT/SMALLSUBDIRTREE" restore 706e249a

The current speed is completely untenable: there’s still 32% (17.5GiB of 72GiB) of the backup to restore, but at this rate it would take ~113 hours to finish. And even if I could/should wait this long. this restic restore is using up 38.6GiB (of RSS!) of my 64GB of RAM, and tomorrow night I will need this memory for the restic backup that is scheduled to run.

And this is with a very small subset, measuring ~0.5% of my entire backup: I shudder to think what would happen if I tried to restore the entire set. If this is restic’s ‘normal’ (and I sure hope it is not) it would make restic entirely unfeasible here as its main use would be disaster recovery.

So, my questions are:

Is that restic restore job beyond hope of ever picking up speed? Or is it best to kill it and start it anew? In that case, any chance of it skipping the 2/3 it has already restored?
Did I do something wrong in the restic restore command-line above? Could I somehow do things better / more efficiently?
Apart from using restic mount, which I would rather avoid, is there any way out for restoring parts or all of that backup?

Thanks in advance for any help.
– Durval.

764287 · May 20, 2019, 7:18am

Unfortunately this is a known issue. You could try out PR #2195 by @ifedorenko and see if it helps.

fd0 · May 20, 2019, 7:28am

Hm, interesting. Can you maybe tell us, is the data restic already restored maybe already complete? It could also be a bug in the implementation of --include… it’s hard to say without a debug log.

There’s something else you could try (using restic 0.9.5): restic dump can export snapshots (or parts of snapshots) as a tar, that’s a completely different code path. It’d look like this:

restic --verbose dump 706e249a /LARGEBACKUPROOT/SMALLSUBDIRTREE | tar x

durval · May 20, 2019, 10:41am

Thanks for both responses, @fd0. Much appreciated!

More below:

Yes, I can because I’m following up disk space usage on the brand-new FS (actually a ZFS dataset) I created just for restoring this backup both (a)keeps growing, and (b) is also 30% below the total space it should be using when fully restored:

# date; df -ki /mnt/REDACTED1/SUBTREE_BEING_RESTORED
Mon May 20 04:58:12 -03 2019
Filesystem                1024-blocks     Used       Avail Capacity iused       ifree %iused  Mounted on
REDACTED1/SUBTREE_BEING_RESTORED 21004273381 55985108 20948288272     0%  164508 41896576545    0%   /mnt/REDACTED1/SUBTREE_BEING_RESTORED

# date; df -ki /mnt/REDACTED1/SUBTREE_BEING_RESTORED
Mon May 20 04:59:50 -03 2019
Filesystem                1024-blocks     Used       Avail Capacity iused       ifree %iused  Mounted on
REDACTED1/SUBTREE_BEING_RESTORED 21004275146 55995294 20948279852     0%  164508 41896559705    0%   /mnt/REDACTED1/SUBTREE_BEING_RESTORED
#

(notice how the Used field keeps crawling up)

In the IRC channel you mentioned there’s a way to turn on debug on a running process – if you tell me how, I can enable it.

fd0:

There’s something else you could try (using restic 0.9.5): restic dump can export snapshots (or parts of snapshots) as a tar , that’s a completely different code path. It’d look like this:
restic --verbose dump 706e249a /LARGEBACKUPROOT/SMALLSUBDIRTREE | tar x

Will do. Is there a way to somehow have restic ‘skip’ the files that have already been restored? I guess not with this dump method, but perhaps with plain restic restore?

Thanks again,
– Durval.

durval · May 20, 2019, 10:41am

Thanks @764287, I was aware of the PR but not of the #1400. WIll have a look at it.

moritzdietz · May 20, 2019, 4:27pm

Take a look here: Participating — restic 0.16.3 documentation
/edit: Ah I saw that you meant enabling debug on a running process. That I don’t know… I’ll leave the pointer for the “normal” debug in the thread though. As it might still be useful ¯_(ツ)_/¯

durval · May 22, 2019, 10:14am

Hello everyone,

I could not wait anymore (even pausing it with kill -STOP so it would be moved to swap, the restic restore process somehow ended up greatly slowing my daily backup job), so I had to abort it with `kill -9’; @fd0, sorry for not being able to wait for your further instructions re: debug.

My next attempt (as soon as I complete one more backup) will be with @ifedorenko’s out-of-order-restore-no-progress tree from PR #2195, as per @764287 recommendation.

Will keep this thread posted.
– Durval.

durval · May 23, 2019, 1:43am

I did, and restic_out-of-order-restore-no-progress worked admirably: I was able to finish the restore in exactly 4h08m, almost topping my available 50Mbps upload bandwidth the whole while:

Great job, @ifedorenko!

Unfortunately, when I verified the restored files (using shasum) I detected that two of them had their contents corrupted; comparing them to their originals held in a ZFS snapshot, I was able to determine that all their metadata (including size and mtime) was exactly the same, but one’s data had 13 bytes changed to apparently random values, while the other had 8 bytes zeroed out.

I don’t think it’s hardware as I’m using good server-class hardware – Xeon processors with ECC RAM on all involved machines, but perhaps it wasn’t restic’s fault either, as I missed making a snapshot of the restored files right after restic restore finished, and the restore target was to a network shared directory where conceivably something else could have messed with it – so I will try and repeat this test on the near future, on a more controlled setting, so as to be able to pinpoint exactly where the corruption was introduced.

When I reach a conclusion on that, I will post an update here.

Cheers,
– Durval.

ifedorenko · May 23, 2019, 2:13am

Can you try restoring the corrupted files again, both using restic 0.9.5 and the out-of-order branch? If corruption is reproducible, can you try restore with --verify parameter, which tells restic to read the files after restore and verify checksum? Thank you in advance.

durval · May 23, 2019, 2:57am

Will do, @ifedorenko. Not right away as the machine is updating the cloud backups right now, but tomorrow morning (about 12-15h from now) it should be finished and then I will do as you ask, with the out-of-order branch because the standard 0.9.5 takes too damn long and blows up my backup window (when I tried, it took over 72h and could not get past ~2/3 of the total to be restored). If the same files get corrupted again, then I will redo the restore with --verify as per your guidance.

If you think it would be useful to restore just those 2 files (ie, specifically --include them, which would exclude all the rest) , then I can give 0.9.5 a try, please let me know.

Cheers,
– Durval.

ifedorenko · May 23, 2019, 11:01am

Yes, this is what I meant. If we are lucky, the problem is specific to these two files and will be much easier to pinpoint. Also, --verify is currently singlethreaded and will likely be prohibitively slow for large datasets.

durval · May 23, 2019, 2:07pm

Thanks for your assistance in hunting this down, @ifedorenko. Much appreciated!

Thanks for the clarification, and Amem to that.

Thanks for the information. I see via lsof that indeed, without --verify, the restic-invoked rclone is keeping like 8-10 simultaneous open TCP connections to Google… which helps explain why it is so crazy fast.

I’m right now doing a first restore using the exact same command-line I used yesterday, just restoring to a different place (ie, to a local directory in the backup server, instead of to a network share on the NAS server); I can use the exact same command-line because I always specified --target=., so just by cd’ing to the local directory, I can change the destination. When it is finished, I will check the SHASUMs of all files again, not only of these two (to be able to detect the problem even if it has somehow ‘moved around’ to some other file(s)).\

Presuming the problem repeats itself exactly the same (ie, in exactly those two files), I will then repeat the command but will add --verify and two --includes (one for each file), and then lets see if it repeats again.

I will keep this thread posted.

Cheers,
– Durval.

durval · May 23, 2019, 4:53pm

The restore I mentioned above just finished (within a couple of minutes of the one I ran yesterday – I love that, predictability! ) and I immediately checked the entire backup; got exactly the same result, everything checked except those two files, and they got corrupted in exactly the same way (same changes when compared to the ones I have saved in the ZFS snapshot).

So, we have the next best thing after no errors: repeatable errors

I’m now restoring again, this time with `–verify, and a include for the entire directory those two files are at: they are in the same directory, and it’s a relatively small one with just 34 other files – 36 files total, and less than 5.2MB of total disk space in use.

Will post back here as soon as it’s over.
– Durval.

durval · May 23, 2019, 5:08pm

OK, that was fast: the restore --verify finished in less than 10 minutes! yay!
(I was under the impression it could take as long as the original restore because restic would somehow have to “go through” all the indexes etc again – glad that wasn’t the case, or if it was, restic’s cache or whatever made short work of it).

Anyway, the restore reported no errors, as follows:

date; time restic_out-of-order-restore-no-progress --verbose --target=. --include="/REDACTED1/REDACTED2/REDACTED3/REDACTED4/REDACTED5/REDACTED6" restore --verify 706e249a; date
    Thu May 23 13:45:42 -03 2019
    repository abb62ab4 opened successfully, password is correct
    verifying files in .
    finished verifying 34 files in .
    real    9m5.334s
    user    11m29.547s
    sys     1m1.071s
    Thu May 23 13:54:47 -03 2019

I then checked the two files, and as expected (as the --verify didn’t point out anything wrong), they again are corrupted, and in exactly the same way (same changes at each file when compared to their originals in the ZFS snapshot).

What do you think?

Cheers,
– Durval.

ifedorenko · May 23, 2019, 6:07pm

During backup, restic splits each file into one or more blobs, then stores the blobs along with their sha256 checksum into the repository. Restore takes the blobs from the repo and writes them to files on filesystem. --verify then reads the blobs from files and verifies their checksums match checksums stored in the repository. The fact that --verify didn’t complain makes me think that there was a hiccup during backup.

Can you restore the same two files with restic 0.9.5 and maybe with 0.8.x (which was the last version before restore rework was merged, iirc). If both 0.9.5 and 0.8.x both produce the same corrupted files, I’d say something happened during backup.

durval · May 23, 2019, 6:24pm

Hi @ifedorenko,

Thanks for the prompt reply.

So, we can be pretty sure that the blobs didn’t get messed up (eg, by Google Drive) between the time restic backup saved them there, and the time restic restore read them back, correct?

It could have been; it was a pretty messy affair, with me dealing with internet link outages (which more than once caused restic backup to lock up, and other ‘fun’ stuff happening). Moreover, these files haven’t been changed (according to their mtime) since sometime last year, so it ought to have been backed up by my first restic backup that happened to go over it (and it probably wasn’t that 2019/05/14 backup). So, this could have happened anytime during the 30 days (and many restic backup executions) it took for me to finish my the first successful copy. BUT, if the corruption occurred sometime during that process (ie, before the first snapshot got written), restic’s own deduplication logic would have made it avoid the ‘corrupted’ blobs, right? I mean, without a snapshot, restic backup was forced to read the entire local filesystem again and recalculate all the running checksums, and so would automatically ignore those hypothetical “corrupted” blobs because they would just have a different checksum, no? Therefore, that corruption would have to have happened after the first snapshot was written. But then, after the first snapshot was saved, neither the size nor the mtime for each of these files was changed, then restic would not even try to upload them again, correct? I’m probably missing something, as I can see no way for restic to have uploaded that messed up information

Right away, captain! But please, when you can please answer my question above; I have complete --verbose stdout+stderr logs of all my restic executions, and I can check them for hiccups if you help me isolate better the moment they could have happened.

Cheers,
– Durval.

ifedorenko · May 23, 2019, 6:28pm

blobs are addressed by their sha256 and restic validates blob content matchs blob id. so, yeah, the blobs most likely were not corrupted by gdrive or anything else between backup and restore. to the extend we can trust sha256, of course

ifedorenko · May 23, 2019, 6:36pm

I can think of only two possibilities for corrupted files to be written to the backup. Either restic was given bad bytes by the operating system or there is a bug in backup that mangled the blobs before their checksum was calculated. RAM corruption is also a possibility, but ECC should make that virtually impossible I think. @fd0 can you think of anything else?

But once corrupted blobs were written to the repository, restic does not re-read the files from filesystem as long as flle size+timestamp remain the same.

durval · May 23, 2019, 6:43pm

to the extend we can trust sha256, of course

LOL, I think SHA256 is pretty much 100% reliable at random it would mean a chance of undetected corruption in 2^256 (which is like taking the Universe’s age in femtoseconds and then multiplying it by 1 followed by 32 zeros), and it happened TWICE in a single restore, so I think we can discard that hypothesis.

And even if someone was cryptographicaly trying ruin my day by writing a bad blob with an intentionally colliding checksum, the SHA256 algorithm is believed to make that a brute-force attempt requiring about the same order of magnitude calculations, so again in this case I think we can disregard the chance

durval · May 23, 2019, 7:13pm

Hi @ifedorenko,

Replying to the last part of your message first:

You really mean, “once corrupted blobs were written to the repository and a restic snapshot was written”, correct? The distinction is important in my case because I had to go through like a dozen interrupted restic backup executions before one was able to finish and write its first restic snapshot containing those files…

OK, I think I see it now: either of these situations would necessarily have happened in that exact restic backup which managed to complete, which is when the first restic snapshot got written, correct? I ask because if it had happened before this one, this last backup would have no way to check size+mtime, and would have recalculated and then just ignored the corrupted blobs… or am I still missing something?

I fully agree: ECC is mathematically guaranteed to detect and correct any 1-bit error per block, and to detect and alarm (usually causing a system panic) any 2-bit errors. Here are the differences I see on the messed up parts of these files (all offsets in decimal and all byte values in octal, as generated by cmp -l):

First File:

                       553  10   5
                       556 152 164
                       557 144 145
                       558 141 163
                       559 166 164
                       560 141 145
                    352877 140  40
                    352878  60 110
                    352879  41 176
                    352880  53 173
                    352881 252  57
                    352882   6 127
                    352883 325 324

Second File:

                    119405  20   0
                    119406 344   0
                    119407  71   0
                    119408  47   0
                    119409 252   0
                    119410   6   0
                    119411 325   0
                    119412   1   0

As we can see, the above would mean a lot of 1- and 2-bit errors, so even if I had bad memory failing all the time, the above errors are about guaranteed not to happen.

Of course, as I did not install that server’s memory myself (it came fully built from an Apple reseller), it could have happened that someone in the reseller swapped the ECC RAM that Apple ships it with for non-ECC RAM. But as far as I was able to Google, seems a MacPro 2013 like ours won’t even boot with non-ECC RAM.

Anyway, let’s be paranoid: the next time I can schedule a shutdown on that server, I will open it up and visually check whether it really has ECC RAM.

And I think we can also discard the possibility that somehow the disk(s) where those files ended up being stored silently returned corrupted blocks when they were read by restic backup: everything here is on ZFS, which saves and verifies a strong checksum for each disk block. Such an event would have resulted in a CKSUM being displayed by zpool status for that hypothetical device, and (as I have raidz2 redundancy) the corrupted block would then be re-read/regenerated from the pool redundancy. So, no chance of that happening here either (apart from a checksum collision, which I think is really, really unlikely, using the same rationale for SHA256 corruption in my previous email).

Would love to hear @fd0’s opinion on all this too.

But it seems beyond doubt that the blobs these files belong to are corrupted. And that would be on all snapshots since the first What would be the best way to ‘repair’ them in the repo, as fast as possible, and losing as little as possible of those snapshots? perhaps I can somehow identify the bad blobs and remove them from the repo, or even correct them, saving fixed ones in their place?

Cheers,
– Durval.