Sorry for the delay. I got side tracked and wanted to be thorough.
$ restic version
restic 0.13.1 compiled with go1.18.1 on linux/amd64
$ restic-13 version
restic 0.13.1 (v0.13.0-359-gf0bb4f87-dirty) compiled with go1.19 on linux/amd64
restic
is from the Arch Linux repo and restic-13
was compiled by me with go run build.go
. The (dirty) changes just point it to the chunker repo on my system which I made changes to result in smaller chunks. I have the patch for that below.
$ git diff -U1 932cc2^
diff --git a/chunker.go b/chunker.go
index 6676eba..46f8827 100644
--- a/chunker.go
+++ b/chunker.go
@@ -16,5 +16,5 @@ const (
// MinSize is the default minimal size of a chunk.
- MinSize = 512 * kiB
+ MinSize = 512 // * kiB
// MaxSize is the default maximal size of a chunk.
- MaxSize = 8 * miB
+ MaxSize = 16 * kiB // 8 * miB
@@ -107,3 +107,3 @@ func NewWithBoundaries(rd io.Reader, pol Pol, min, max uint) *Chunker {
MaxSize: max,
- splitmask: (1 << 20) - 1, // aim to create chunks of 20 bits or about 1MiB on average.
+ splitmask: (1 << 13) - 1, // aim to create chunks of 20 bits or about 1MiB on average.
},
@@ -133,3 +133,3 @@ func (c *Chunker) ResetWithBoundaries(rd io.Reader, pol Pol, min, max uint) {
MaxSize: max,
- splitmask: (1 << 20) - 1,
+ splitmask: (1 << 13) - 1,
},
The filesystem used is Btrfs. I do have LZO compression on, but that should be transparent to Restic, yeah?
$ du -sh ~/.test-dedup/*
263M /home/nstgc5/.test-dedup/1bup
581M /home/nstgc5/.test-dedup/2borg
937M /home/nstgc5/.test-dedup/3restic
$ du -sh ~/.test-dedup/*/*
0 /home/nstgc5/.test-dedup/1bup/branches
1.9M /home/nstgc5/.test-dedup/1bup/bupindex
16M /home/nstgc5/.test-dedup/1bup/bupindex.meta
4.0K /home/nstgc5/.test-dedup/1bup/config
4.0K /home/nstgc5/.test-dedup/1bup/description
4.0K /home/nstgc5/.test-dedup/1bup/HEAD
60K /home/nstgc5/.test-dedup/1bup/hooks
4.0K /home/nstgc5/.test-dedup/1bup/info
1.6M /home/nstgc5/.test-dedup/1bup/logs
243M /home/nstgc5/.test-dedup/1bup/objects
1.6M /home/nstgc5/.test-dedup/1bup/refs
4.0K /home/nstgc5/.test-dedup/2borg/config
580M /home/nstgc5/.test-dedup/2borg/data
12K /home/nstgc5/.test-dedup/2borg/hints.1613
1.3M /home/nstgc5/.test-dedup/2borg/index.1613
4.0K /home/nstgc5/.test-dedup/2borg/integrity.1613
4.0K /home/nstgc5/.test-dedup/2borg/README
4.0K /home/nstgc5/.test-dedup/3restic/config
885M /home/nstgc5/.test-dedup/3restic/data
51M /home/nstgc5/.test-dedup/3restic/index
4.0K /home/nstgc5/.test-dedup/3restic/keys
0 /home/nstgc5/.test-dedup/3restic/locks
1.6M /home/nstgc5/.test-dedup/3restic/snapshots
That’s with my current work directories as well as the Btrfs snapshots I’ve been keeping of them. Below is from just backuping up my working directories.
$ du -sh ~/.usrtest/*/*
0 /home/nstgc5/.usrtest/1bup/branches
624K /home/nstgc5/.usrtest/1bup/bupindex
40K /home/nstgc5/.usrtest/1bup/bupindex.meta
4.0K /home/nstgc5/.usrtest/1bup/config
4.0K /home/nstgc5/.usrtest/1bup/description
4.0K /home/nstgc5/.usrtest/1bup/HEAD
60K /home/nstgc5/.usrtest/1bup/hooks
4.0K /home/nstgc5/.usrtest/1bup/info
19M /home/nstgc5/.usrtest/1bup/objects
0 /home/nstgc5/.usrtest/1bup/refs
4.0K /home/nstgc5/.usrtest/2borg/config
27M /home/nstgc5/.usrtest/2borg/data
4.0K /home/nstgc5/.usrtest/2borg/hints.5
164K /home/nstgc5/.usrtest/2borg/index.5
4.0K /home/nstgc5/.usrtest/2borg/integrity.5
4.0K /home/nstgc5/.usrtest/2borg/README
4.0K /home/nstgc5/.usrtest/3restic/config
27M /home/nstgc5/.usrtest/3restic/data
360K /home/nstgc5/.usrtest/3restic/index
8.0K /home/nstgc5/.usrtest/3restic/keys
4.0K /home/nstgc5/.usrtest/3restic/locks
300K /home/nstgc5/.usrtest/3restic/snapshots
$ du -sh ~/.usrtest/*
20M /home/nstgc5/.usrtest/1bup
27M /home/nstgc5/.usrtest/2borg
28M /home/nstgc5/.usrtest/3restic
Please note that the .usrtest
doesn’t imply that I’m trying to backup/dedup /usr/
. It’s just a script that I had used for that purpose points to that directory for repos.
They all seem to do about the same when the redundancy is low. This is something I hadn’t really checked before.
Below is the result of Restic compiled to use a smaller chunk size and initialized with --repository-version 1
$ restic-13 backup -vr .test-restic-13 ~/Work/
open repository
enter password for repository:
repository 7e8afb98 opened (repository version 1) successfully, password is correct
created new cache in /home/nstgc5/.cache/restic
lock repository
no parent snapshot found, will read all files
load index files
start scan on [/home/nstgc5/Work/]
start backup on [/home/nstgc5/Work/]
scan finished in 0.366s: 2942 files, 26.939 MiB
Files: 2942 new, 0 changed, 0 unmodified
Dirs: 1355 new, 0 changed, 0 unmodified
Data Blobs: 4062 new
Tree Blobs: 1231 new
Added to the repository: 19.854 MiB (20.202 MiB stored)
processed 2942 files, 26.939 MiB in 0:01
snapshot f7cd372a saved
And repeating this with the version from the Arch repo:
$ restic backup -vr .test-restic ~/Work/
open repository
enter password for repository:
repository c4febf9f opened successfully, password is correct
created new cache in /home/nstgc5/.cache/restic
lock repository
load index files
no parent snapshot found, will read all files
start scan on [/home/nstgc5/Work/]
start backup on [/home/nstgc5/Work/]
scan finished in 0.338s: 2942 files, 26.939 MiB
Files: 2942 new, 0 changed, 0 unmodified
Dirs: 1355 new, 0 changed, 0 unmodified
Data Blobs: 1855 new
Tree Blobs: 1231 new
Added to the repo: 26.741 MiB
processed 2942 files, 26.939 MiB in 0:01
snapshot 6891486b saved
As can be seen, Restic is chopping the files up into finer pieces. We can confirm that there is some space savings.
$ du -sh ~/.test-restic*
28M /home/nstgc5/.test-restic
21M /home/nstgc5/.test-restic-13
Doing the same with those snapshots:
$ restic init .test-restic
$ restic-13 init --repository-version 1 -r .test-restic-13
$ restic backup -vr .test-restic ~/Work/Snapshots/
open repository
enter password for repository:
repository 80ec1906 opened successfully, password is correct
created new cache in /home/nstgc5/.cache/restic
lock repository
load index files
no parent snapshot found, will read all files
start scan on [/home/nstgc5/Work/Snapshots/]
start backup on [/home/nstgc5/Work/Snapshots/]
scan finished in 21.366s: 1113955 files, 15.234 GiB
Files: 1113955 new, 0 changed, 0 unmodified
Dirs: 501528 new, 0 changed, 0 unmodified
Data Blobs: 10702 new
Tree Blobs: 434331 new
Added to the repo: 852.270 MiB
processed 1113955 files, 15.234 GiB in 26:54
snapshot 297af141 saved
$ restic-13 backup -vr .test-restic-13 ~/Work/Snapshots/
open repository
enter password for repository:
repository ca1319f6 opened (repository version 1) successfully, password is correct
created new cache in /home/nstgc5/.cache/restic
lock repository
no parent snapshot found, will read all files
load index files
start scan on [/home/nstgc5/Work/Snapshots/]
start backup on [/home/nstgc5/Work/Snapshots/]
scan finished in 33.982s: 1113955 files, 15.234 GiB
Files: 1113955 new, 0 changed, 0 unmodified
Dirs: 501528 new, 0 changed, 0 unmodified
Data Blobs: 27029 new
Tree Blobs: 434331 new
Added to the repository: 863.537 MiB (893.898 MiB stored)
processed 1113955 files, 15.234 GiB in 24:52
snapshot 4792dc4b saved
As can be seen here, for larger data sets with lots of redundancy, the smaller chunk size doesn’t help. I’d have confirmed this with du
, but I Ctrl+R’d to rm -rf ~/.test-restic*
instead. Oops. I’m getting a bit hasty.
Note that before I was using a script that mounted each snapshot into a fixed directory before having running restic backup
. This might make a difference. It certainly is harder on my ~/.cache
to do it this way.
And to give a sense of how redundent the data is:
$ sudo btrfs fi du -s ~/Work/Snapshots/
Total Exclusive Set shared Filename
15.44GiB 69.97MiB 230.97MiB /home/nstgc5/Work/Snapshots/
Note that btrfs fi du
does not report compressed size, so it’s reasonable to compare the other results against this for the sake of deduplication.
edit: And I’ll reply to @gurkan later today. I do have something to say in reply (some thoughts on that hypothosis), but I need to take time to put my thoughts in order.