Three days ago, Nick Craig-Wood, creator of rclone, posted a bug report to the GitHub repo for Microsoft’s OneDrive. “Sometimes (maybe one time in 20) multipart uploads of a 128MiB file get corrupted,” his post explains.
But “Microsoft got back to us simply to say the OneDrive issue has apparently been fixed.”
The topic is to just to warn you that some of the backups to OneDrive may not be correct and the error may not be reproducible.
I’m new to restic (love it by the way!!) and I use rclone OneDrive for the backend so this has me curious and concerned.
What are peoples’ general strategies for backup verification of data integrity? I know about restic check, which is fine for a sanity check, but to use the same tool to test its own dataset has me a little concerned.
This is not a knock on Restic, I really love the tool. Maybe I’m just paranoid.
My own style is to do a backup to a local usb drive which is plugged in only when the backup is run. After the backup a
restic -r repo check --read-data-subset=1%
checks 1% the backup data vs the existing files and takes only 1 minute of time. I have not ventured into using One Drive for offsite backups which is why I was interested when The Register article appeared.
This is pretty simple to answer The only way to know for sure that your backups are correct is to actually restore it and compare the restored files with the originals or some other known-to-be-accurate data about the files that enables you to verify that the restored files are indeed 100% exact copies of the originals. Naturally this is not something a lot of people do.
That leaves you with either using restic’s
check --read-data or using some other tool that does the equivalent. Whatever other tool you use, it has to implement the restic repository format in order to be able to check this. And in that case, why would you trust that tool more than restic itself?
A more relevant consideration on the topic of not trusting only one tool is to do at least two backups, one with tool X and the other with a totally different tool Y. Then we’re talking!
I have been using OneDrive for a while now and I have never had any major problems. But recently there has been a lot of issues with OneDrive, which is Microsoft’s cloud storage service.
It’s been a while since the issue occurred, but for anyone stumbling across this thread wondering whether your OneDrive repos might be corrupt: If you’ve been using rclone to connect to OneDrive then the repo is likely intact. Upon storing a blob, OneDrive does report the SHA-1 hash of the blob which rclone compares to its own calculated hash. If the two hashes don’t match then rclone will throw an error message and fail – this is also how this issue was detected. Can’t hurt though to verify the integrity yourself, as mentioned above.