I was doing some benchmarks yesterday to compare Borg v1 and v2, as well as test some network optimizations that our users sometimes ask about. Also included Restic, since we already have a good number of users choosing it on BorgBase.com. (We support both, so I’m not biased either way.)
This is also not meant to bash either tool, but to show where one may learn from the other. There could also be an issue with my test, so I’m also looking for feedback there.
So here the summarized data. Measurements are taken by GNU Time and the full results, as well as the test data and script are all on Github to reproduce my tests.
My guess would be that this are mostly some buffer which are preallocated. Although that is hard to tell from the numbers.
That looks very odd. We definitely don’t read files twice. The part of restic which reads a file, has essentially no idea whether a repository get’s compressed or not.
What is also rather strange is that borg reads more data in the create-2 test case when backing up to the high latency backend compare to the normal latency one. I don’t think that should happen?
According to your test scripts repository, the first data set is 9.3 GB so why is every backup tool reading twice that amount? And for the text data set (10.2GB?) we’re now at 2-4x as much data. Neither scenario makes sense.
I also don’t see any flushing of the filesystem cache between the test runs (at least between different backup tools that would be a good idea).
Yes, currently new pack files are created as temporary files to not have to keep them in memory.
The create-3 test run seems to “cheat” when using borg. Not reading any data doesn’t make sense, unless borg is just getting all data it needs from the server.
This would explain the extra data read and written. So it’s not reading the source files twice, but the temporary files.
Excellent point. Just tested and without flushing, some files may come from the cache. This will make a small difference in the “File system inputs” number, especially for the prune command that reads very little data, as app libs may come from cache.
Overall GNU time does seem accurate on this, as tested with dd:
# /bin/time -v dd if=/dev/zero of=./test.tmp bs=512 count=1000
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 0.00840524 s, 60.9 MB/s
Command being timed: "dd if=/dev/zero of=./test.tmp bs=512 count=1000"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 80%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.01
...
Maximum resident set size (kbytes): 1972
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 2
Minor (reclaiming a frame) page faults: 93
Voluntary context switches: 18
Involuntary context switches: 20
Swaps: 0
File system inputs: 1288
File system outputs: 1000
...