Restore slower than check

I am doing some benchmarking and comparison of Restic and Duplicacy. I am actually testing backups on a fuse file system (yet tested Hubic and Mega). In all tests I have noticed that check -read-data is twice slower than restore (ex. 164s check, 315s restore) I don’t understand why, the check -read-data needs to load all chunks, of course restore need some computation time to restore the files but it should be very short compared with the internet traffic.

When using duplicacy, these two operations take similar time, quite close from the restic check -read-data.

I would be also interested to know if there are some experience of a fuse fs as backend. I find it very useful as restic allow few backends, while almost every cloud can be mounted as a fuse fs. Of course a native API is to be preferred, when available.

Marc

First, which version of restic did you use to do your tests?

In all tests I have noticed that check -read-data is twice slower than restore (ex. 164s check, 315s restore)

I assume you meant the other way around: check is twice as fast as restore, right?

I don’t understand why, the check -read-data needs to load all chunks, of course restore need some computation time to restore the files but it should be very short compared with the internet traffic.

That highly depends on the structure of the data. Many small files will take a bit longer to restore than a few large files.

The reason that restore is not as fast as it could be is that it isn’t yet fully optimized, it runs sequentially through all the files. A pull request is in the works to improve restore.

The check command on the other hand reads several files in parallel, that’s a process we could easily do concurrently.

I would be also interested to know if there are some experience of a fuse fs as backend.

I don’t understand what you mean. Use the local backend in restic to write a repository to a directory that is itself a mounted file system via fuse, like sshfs or similar?

First, which version of restic did you use to do your tests?

restic 0.8.0

In all tests I have noticed that check -read-data is twice slower than
restore (ex. 164s check, 315s restore)

I assume you meant the other way around: check is twice as fast as restore,
right?

Yes check is twice as fast, the numbers were right.

I don’t understand why, the check -read-data needs to load all chunks, of
course restore need some computation time to restore the files but it
should be very short compared with the internet traffic.

That highly depends on the structure of the data. Many small files will
take a bit longer to restore than a few large files.

The reason that restore is not as fast as it could be is that it isn’t
yet fully optimized, it runs sequentially through all the files. A pull
request is in the works to improve restore.

The check command on the other hand reads several files in parallel,
that’s a process we could easily do concurrently.

Ok, but if multi threading is an important factor when reading from the net, when accessing local files, it does not make much difference. An when using a fuse fs as backend, restic access only the local file, all data transfer is done by fuse, which can be multi-threaded.

As a side point, the restic manual does not warn that without reading data you check the structure, but not the chunks, so you don’t know if your data is healthy.

I would be also interested to know if there are some experience of a fuse
fs as backend.

I don’t understand what you mean. Use the local backend in restic to
write a repository to a directory that is itself a mounted file system via
fuse, like sshfs or similar?

Of course I use the local backend, but there are some point where we should
be careful, I have mainly noted some drawbacks of the cache, fuse always
cache in the kernel, except with a specific option, but many fuse filesystem
add caching in a local folder, like MegaFuse, then when you upload data, it
immediately fill the cache, and if it does not saturate it, come back very
quickly, and fuse continue for long to upload its cache, here it is usefull
to limit the upload rate.

Worse, if you do a backup, followed by a check, if you do not take care of emptying the cache your check succeed very quickly, having only checked the cache let by your previous backup. So always unmout and remount the fuse fs, between the two operations.

May be there are other havocs, it is why I asked this question.

Marc

PS Some time after writting this I repeated my tests on a true local storage, and what I said before is no longer true for a genuine local storage. On a tiny old armv6 nas, for 3.5G of data backup need 42mn, check --read-data 31mn,restore 22mn.
fuse fs are local, but not truly local!

You can choose not to use the restic cache by passing --no-cache.

That’s specific to the fuse fs implementation you’re using. restic tries hard to make sure the files get written. Unfortunately, those are implementation details that cannot be detected by restic, so you’re on your own making sure that umount/flush happens.