Why can't I resume a partial upload?

whatisaphone · December 23, 2017, 6:20pm

I’m a CrashPlan refugee, looking to replace them with restic + B2. On paper, restic looks like a perfect fit, but I’m running into a hiccup. It sounds like it’s supposed to support resuming partial uploads, but I’m not seeing that behavior in my tests.

My environment is restic_0.8.0_linux_amd64 on a QNAP TS-453B NAS with QTS 4.3.3, uploading to Backblaze B2.

I’m testing with around 700MB of data. I repeatedly run restic for a minimum of 6 minutes (since progress is saved every 5 minutes), kill -TERM restic, and re-run the backup command. I expected that it would pick up where it left off each time, and finish after a few runs, but instead it seems to be starting from scratch every run, destined never to finish.

Terminal output

[/share/CE_CACHEDEV1_DATA/restic] # ./restic backup /share/Storage/Books --limit-upload 500
password is correct
scan [/share/Storage/Books]
scanned 27 directories, 276 files in 0:00
Terminated22%  626.893 KiB/s  282.224 MiB / 719.673 MiB  183 / 303 items  0 errors  ETA 11:54
[/share/CE_CACHEDEV1_DATA/restic] # ./restic backup /share/Storage/Books --limit-upload 500
password is correct
scan [/share/Storage/Books]
scanned 27 directories, 276 files in 0:00
Terminated10%  700.285 KiB/s  288.594 MiB / 719.673 MiB  202 / 303 items  0 errors  ETA 10:30
[/share/CE_CACHEDEV1_DATA/restic] # ./restic backup /share/Storage/Books --limit-upload 500
password is correct
scan [/share/Storage/Books]
scanned 27 directories, 276 files in 0:00
Terminated54%  668.797 KiB/s  277.369 MiB / 719.673 MiB  181 / 303 items  0 errors  ETA 11:19
[/share/CE_CACHEDEV1_DATA/restic] # ./restic backup /share/Storage/Books --limit-upload 500
password is correct
scan [/share/Storage/Books]
scanned 27 directories, 276 files in 0:00
Terminated86%  636.941 KiB/s  272.442 MiB / 719.673 MiB  163 / 303 items  0 errors  ETA 11:59
[/share/CE_CACHEDEV1_DATA/restic] # ./restic backup /share/Storage/Books --limit-upload 500
password is correct
scan [/share/Storage/Books]
scanned 27 directories, 276 files in 0:00
 signal interrupt received, cleaning up
Terminated06%  724.982 KiB/s  281.073 MiB / 719.673 MiB  202 / 303 items  0 errors  ETA 10:19

Meanwhile, B2 shows my used storage as 1.8 GB, far bigger than my dataset of 700MB. Is there anything that explains why these results don’t match up with what I expected?

rawtaz · December 23, 2017, 8:40pm

As far as I know, restic will save a temporary index every five minutes or so, and this means that on the next run it will be able to identify which blobs/data has already been uploaded, and will not upload them again. So it really should not upload data which has already been uploaded and for which an index has been saved.

It does however still scan all your files and directories, and will process them in the same order. So if you cancel it at approximately the same point in that process every time I doubt it will ever reach the point where it starts processing not yet uploaded data.

I cannot explain the 1.8 GB though.

whatisaphone · December 23, 2017, 9:38pm

Thanks for the response!

I repeated the process a few more times, this time with 10 minutes between kills, to be completely sure. Same result: B2’s usage continues to increase, now up to 3.2 GB for my 700 MB dataset.

Based on the docs, I understand restic uses content-addressable storage. I know the files aren’t changing underneath me. B2 bucket versioning is disabled (“Keep only the last version”). Shouldn’t this mean that my situation is impossible? Any re-uploaded data would overwrite a previous upload of the same data with the same hash as its filename.

I am very confident that restic is spending its time uploading, not just scanning, based on system network usage and the increasing bucket size reported from B2. The initial scan completes in about a second since I’m only backing up 276 files.

Spitballing here – restic’s docs make note of 2 directories: $TMPDIR and ~/.cache/restic.

$TMPDIR seems to remain empty throughout the process, from running ls a few times (or the files are unlinked immediately). However I know it’s used because I was getting errors before I set it to a file system with enough space. I verified I can write to this dir from the shell.
~/.cache/restic has some directories, but only two files: CACHEDIR_TAG and $hash/version. Could this indicate a problem? I also verified I can write to this dir.

^^ may or may not be useful.

Clearly what I’m seeing should not be happening. How can I debug this further?

rawtaz · December 23, 2017, 9:46pm

I forgot to say welcome to the community!

Do you possibly have atime enabled in your filesystem?

whatisaphone · December 23, 2017, 11:12pm

Thanks!

Here’s the relevant line from /proc/mounts:

/dev/mapper/ce_cachedev1 /share/CE_CACHEDEV1_DATA ext4 rw,relatime,noacl,stripe=256,data=ordered,jqfmt=vfsv0,usrjquota=aquota.user 0 0

As I understand it, due to relatime, atime shouldn’t have changed between runs since these runs all happened on the same day. Or I could be wrong. Either way I don’t think it’s possible to change the mount options because QNAP’s OS resets most of the underlying linux system every reboot.

Will waiting for the next release with this PR solve my problem?

rawtaz · December 23, 2017, 11:17pm

Not sure the atime track is relevant, it was a long shot. Might very well not have anything to do with it.

The author of restic, fd0, will reply to this thread, sounded like he had a hunch about what might be going on. So stay tuned till tomorrow or so

Personally I run the latest master of restic, if you want to try it out you can get a precompiled binary from https://beta.restic.net or just clone the repository and in the corresponding folder run go run build.go to compile it yourself.

Let’s see what @fd0 has to say

fd0 · December 24, 2017, 10:50am

Nice. Since restic is still a rather young project, there are a few rough edges.

I indeed have a hunch what’s going on here: restic uploads the intermediate index files while uploading other data. If you have small upstream bandwidth (it looks that way) it may happen that larger files are in the process of being uploaded, so the upload of the index file is delayed. Then you kill the process (with kill?) and it never gets around to uploading the index file.

Can you check if there are any files in the index/ folder in the repo?

You can try different things:

Limit the number of concurrent uploads to B2 with -o b2.connections=2 (default is 5), so that the index has a better chance of being uploaded in between pack files.
Cancel restic with SIGINT (ctrl+c) instead of killing it, that should finish all pending uploads before exiting.
You can manually re-build an index from all data that’s in the repo, with restic rebuild-index. That will download the header files for all pack files in the repo, so you trade not having to start uploading from scratch with having to download a bit of each file.

When you run restic prune on the large repo, you’ll see that it detects a huge amount of duplicate data, and cleans that up afterwards.

whatisaphone · December 24, 2017, 8:09pm

With your help I got it working! I ran my test again with -o b2.connections=2, and gave it more time before killing, and it was able to resume. Success! Then I added more data and ran another backup without limiting the speed. It seems like index files are consistently being written about every 15 minutes (instead of 5 minutes) for whatever reason. I see timestamps of 13:37, 13:53, 14:08, 14:24, 14:41. I should have been more patient from the beginning! In real-world usage, I’ll only be killing it after 10 or more hours, so I can ignore the extra few minutes.

I should have mentioned that I tried using ctrl+c, but thought it had ignored me since it continued to upload for several minutes. That is why I resorted to SIGTERM. Turns out if you ctrl+c and then let it run for 10+ more minutes it does eventually quit on its own. I see there is an issue and a PR that looks like it might fix this next release?

I’d like to say that I really like restic’s design - it reminds me of git (in a good way ). I also appreciate your approach to development, from what little I’ve read on github. I now feel confident in relying on restic. Thanks for your help getting off the ground!

fd0 · December 24, 2017, 8:25pm

Oh, you’re right! For index files that aren’t “full enough”, restic only uploads an index every 15 minutes.

oh yes, that’s already merged (and included in the beta builds, in case you want to give it a try). I’m mostly using the master branch, so I already have that feature

Thank you very much I hope the programs helps you, may you never need it