Understanding `--read-data-subset=n/t`

Denis · February 20, 2026, 5:28pm

First of all, thank you all for the informations you shared in this topic.

I’ve read documentation, here: Checking integrity and consistency

However, it is unclear to me how this command works:

--read-data-subset=n/t

I’m interested in it, because I’d like to do a deep check (so, using –read-data) on my backup; but it takes too many resources to do a complete one after every execution. So, it would be great to split it over multiple days. I wonder if this command guarantees that doing:

--read-data-subset=1/7 on Monday

--read-data-subset=2/7 on Tuesday

….

--read-data-subset=7/7 on Sunday

all files in the backup are covered. I mean, e.g., on Tuesday, how can Restic know what files have already been checked on Monday?

rawtaz · February 20, 2026, 5:49pm

You are right that the set of --read-data-subset options you wrote will cover and check the entire repository over time, yes.

What happens is that restic takes the entire repository, “mentally” splits it up into seven pieces (t), and then checks only the one piece out of those seven pieces, that you mentioned in the number before the / (n). On Monday the first piece, on Tuesday the second piece, and so on. At the end of the week, you will have checked all seven pieces of the repository.

Restic doesn’t know what you checked in previous runs. All it knows is that this time, you want to check piece n out of t pieces, and then it does that. Because it likes you and wants to make you happy

Can you tell us which parts of the documentation about the --read-data-subset syntax that you felt were hard to understand, and if you have any ideas about how to improve it?

kapitainsky · February 20, 2026, 6:10pm

In addition please note that this is deterministic. Files to check are decided based on their hashes (which happens are also their names in hex). So for example –read-data-subset=11/16 will read all files starting with a as all possible 16 values are 0…f(hexadecimal). As these hashes are random it is very good approximation of files’ subset.

It means that if you run [1..7]/7 you have 100% guarantee that by Sunday you will check all files that were present on Monday + some (but probably not all) files created during that week.

Denis · February 20, 2026, 6:16pm

This is a very good answer, thanks kapitainsky

kapitainsky · February 20, 2026, 6:20pm

BTW - it also means that to make checked parts as equal as possible it is good idea to use n as power of 2. But I do not think it is really critical.

This is what I am using for checks. It avoids problems with missed checks if some simple day counting only is used by storing latest part number in some file:

#!/usr/bin/env bash
set -o errexit

# file to remember data subset part to check
memo_file="/path/to/restic_check_part_memo_file.txt"

# number of parts used for check
m=32

# data subset part to check
[ -f "${memo_file}" ] || printf "1" > "${memo_file}"
n=$(cat "${memo_file}")

restic check --read-data-subset "${n}"/"${m}"

# data subset part to check management
n=$((n+1))
if [ ${n} -gt ${m} ]; then
  # start again from the beginning
  printf "1" > "${memo_file}"
else
  printf '%s' "${n}" > "${memo_file}"
fi

alexweiss · February 20, 2026, 8:58pm

Actually restic currently only uses the first byte of the id. So, for small numbers this doesn’t really matter. For large ones, you may have some small and some larger (up to 2x) selections. And more than 256 is currently not possible.