Recommendations for testing full restore?

I’ve been using restic for a while under Linux to backup to Backblaze B2 and it’s been great. My scripts already do a check after every backup. I’ve done spot checks to see if files seem to be backed up, but I haven’t tested a full restore yet. Even though I trust restic and check, it seems like it would be a good idea to test a full restore.

Does anyone have any suggestions or scripts to test a full restore? Do you just restore to a temp directory and walk the directories you back up and do a diff of some sort? I do have enough room on my drive to do a full restore to a temp directory on that drive, but is there a way to test a restore without doing that?

Thanks.

If you don’t use Windows you can run restic mount, then compare your source data with the mounted data. I believe this is as close you can come to restore without actually restoring to disk.

The comparison can be done directly using a suitable diff tool, or indirectly by means of comparing lists of e.g. md5sum hashes.

Apart from mounting, there is another way to do a full restore without actually using up disk space.

You can export the snapshot (or a subdirectory of it) as a tar stream, and you can pipe that to tar and use tar to do the diff.

If you want to test a full restore, I recommend you use the same command you would use to do the actual full restore. Else you test another functionality than the one you intend to use :wink:

About alternatives for restore: The restore command is the most optimized one and especially is able to minimize traffic from the backend which all other alternatives (mount or dump) are not able to. So, to do a real restore, restore is the recommended command to use.

4 Likes

The way I test (small home user, not a business) is to go "I’m going to pretend my mailserver crashed and my wife urgently needs an email from 2 years ago” and to test I can restore it.

Until I can read the email on my screen, I don’t consider the test passed.

2 Likes

My constraint on restore test is disk space, not network/traffic.

And since restore test is much more frequent, I’d rather test with dump.

And if I think ‘restore’ will behave differently than ‘dump’, then I’ll just use dump to restore when an actual disaster happens.

bottom line: I optimise for the part that happens more often, in a way that I can background it and look at the result later, knowing that nothing is happening to my disk space during this operation.

This is called DR test (Disaster Recovery exercise)
A proper way to do that is:

  1. Create a test plan
  2. Find a free weekend
  3. Run the test
  4. Analyze test results

I do that once in a while on my personal computers. The last time (it was over 2 years ago), the test was as follows:

  1. Turn off your laptop (pretend it is dead)
  2. Take a new laptop
  3. Run the recovery tasks
  4. Try using it to see what is broken

I was curious what gemini had to say about this, so I had it create this bash script for me. What do you think? I’ll of course go through it in detail to make sure I understand everything before running it, but after a quick glance it seemed reasonable.

#!/bin/bash

# --- Configuration ---
REPO_PATH="/path/to/your/repository"
RESTIC_PASSWORD="your_password_here"
FILES_FROM="/path/to/your/backup_list.txt"
RESTORE_BASE="/tmp/restic_restore_test"

# Options: "diff" or "checksum"
VERIFY_MODE="checksum" 
# Set to true to skip downloads and just check if snapshots exist
DRY_RUN=false          

export RESTIC_PASSWORD="$RESTIC_PASSWORD"

# Tracking variables for the summary
PASSED_PATHS=()
FAILED_PATHS=()

# --- Helper: Space Check ---
check_space_availability() {
    local path_to_restore=$1
    local target_dir=$2
    
    # Get uncompressed size of the latest snapshot for this path
    local req_size=$(restic -r "$REPO_PATH" stats latest --path "$path_to_restore" --mode raw-data --json | grep -oP '"total_size":\s*\K\d+')
    # Get available space on target partition in bytes
    local avail_size=$(df -B1 "$target_dir" | awk 'NR==2 {print $4}')

    if [ "$req_size" -gt "$avail_size" ]; then
        return 1 # Not enough space
    fi
    return 0
}

# --- Initialization ---
if [[ ! -f "$FILES_FROM" ]]; then
    echo "Error: File list $FILES_FROM not found."
    exit 1
fi

mkdir -p "$RESTORE_BASE"
echo "--- Starting Restic Restore Drill (Mode: $VERIFY_MODE) ---"

# --- Main Loop ---
while IFS= read -r SOURCE_PATH || [[ -n "$SOURCE_PATH" ]]; do
    [[ -z "$SOURCE_PATH" || "$SOURCE_PATH" =~ ^# ]] && continue

    echo "------------------------------------------------"
    echo "Processing: $SOURCE_PATH"

    # 1. Handle Dry Run
    if [ "$DRY_RUN" = true ]; then
        if restic -r "$REPO_PATH" ls latest --path "$SOURCE_PATH" > /dev/null 2>&1; then
            PASSED_PATHS+=("$SOURCE_PATH (Dry Run - Found)")
        else
            FAILED_PATHS+=("$SOURCE_PATH (Dry Run - Not Found)")
        fi
        continue
    fi

    # 2. Check Space Availability
    if ! check_space_availability "$SOURCE_PATH" "$RESTORE_BASE"; then
        echo "❌ Skip: Insufficient space for $SOURCE_PATH"
        FAILED_PATHS+=("$SOURCE_PATH (Out of Space)")
        continue
    fi

    # 3. Perform Restore
    CURRENT_TARGET="${RESTORE_BASE}_$(echo "$SOURCE_PATH" | tr '/' '_')"
    mkdir -p "$CURRENT_TARGET"
    
    echo "Restoring data..."
    if restic -r "$REPO_PATH" restore latest --target "$CURRENT_TARGET" --path "$SOURCE_PATH" > /dev/null 2>&1; then
        
        RESTORED_DATA_PATH="${CURRENT_TARGET}${SOURCE_PATH}"
        echo "Verifying integrity..."

        # 4. Verification Mode Logic
        if [ "$VERIFY_MODE" == "diff" ]; then
            diff -rq "$SOURCE_PATH" "$RESTORED_DATA_PATH" > /dev/null
            RESULT=$?
        else
            # Checksum Mode
            find "$SOURCE_PATH" -type f -exec sha256sum {} + | sort > /tmp/src.hash
            find "$RESTORED_DATA_PATH" -type f -exec sha256sum {} + | sed "s|$RESTORED_DATA_PATH|$SOURCE_PATH|g" | sort > /tmp/res.hash
            diff /tmp/src.hash /tmp/res.hash > /dev/null
            RESULT=$?
            rm -f /tmp/src.hash /tmp/res.hash
        fi

        if [ $RESULT -eq 0 ]; then
            echo "✅ Success!"
            PASSED_PATHS+=("$SOURCE_PATH")
        else
            echo "❌ Integrity Mismatch!"
            FAILED_PATHS+=("$SOURCE_PATH (Data Mismatch)")
        fi
    else
        echo "❌ Restore Failed!"
        FAILED_PATHS+=("$SOURCE_PATH (Restic Error)")
    fi

    # Cleanup individual path to free space for the next one
    rm -rf "$CURRENT_TARGET"

done < "$FILES_FROM"

# --- Final Summary Report ---
echo -e "\n========================================"
echo "         RESTIC TEST SUMMARY"
echo "========================================"
echo "Date: $(date)"
echo "Mode: $VERIFY_MODE | Dry Run: $DRY_RUN"
echo "----------------------------------------"

if [ ${#PASSED_PATHS[@]} -ne 0 ]; then
    echo "✅ PASSED:"
    for p in "${PASSED_PATHS[@]}"; do echo "  - $p"; done
fi

if [ ${#FAILED_PATHS[@]} -ne 0 ]; then
    echo -e "\n❌ FAILED:"
    for f in "${FAILED_PATHS[@]}"; do echo "  - $f"; done
    exit 1
else
    echo -e "\n✨ All paths verified successfully!"
    exit 0
fi

In my opinion, the script is useless because you just wrote a system test for restic. You probably want to go 1 level up to the acceptance testing, or even DR, as the current test duplicates what is already tested in restic codebase.