Expected Size of restic-temp-packs

Silvenga · January 13, 2021, 1:03am

I’ve been using Restic for a couple of years in backing up my NAS (around 10TB of data). The RAM requirements were getting out of hand, so I decided on sharding my backups e.g. device backups in one (all the Windows devices backup to the NAS, not using Restic), documents in another, etc.

With that, I’m freshly backing up my NAS to my colo (going to take 60 days, woot! not wanting to risk a trip to colo). I’m running into a no space left on device error I’ve never seen before.

When I first started using Restic, I move the restic tmp folder to tmpfs as my monitoring saw a huge amount of writes when creating the restic-temp-pack files. I run SSD’s on the primary drives of all my VM, so I try to reduce the amount of writes there. I haven’t changed anything, but I’m apparently running out of space now (about 1GB allocated to tmpfs).

Normally, the restic-temp-pack files seem really small. My monitoring hasn’t see any out-of-memory conditions, which leads me to think that Restic is writing more than 1GB to the tmpfs mount.

How much should we expect Restic to use?

(I also updated to 0.10.0 during this migration (waiting on 0.11.0 to get promoted into debian testing), I was previously using 0.9.4, not sure if something changed with the restic-temp-pack)

My setup is rather trivial. I have my archive server in colo handle pruning (memory/latency related).

#!/bin/bash

restic version

restic snapshots > /dev/null 2>&1
repoExists=$?
if [ $repoExists -ne 0 ]; then
    echo "Repository does not exist, it will be created."
    restic init --verbose
    echo "Repository created."
fi

echo "Starting backup run..."
restic \
    backup \
    --verbose \
    --exclude-caches \
    --one-file-system \
    --tag nas \
    --cleanup-cache \
    --exclude /mnt/cephfs/backups \
    # other excludes
    /mnt/cephfs

status=$?

echo "Backup run completed."

[ $status -eq 0 ] && echo "Success." || exit $status

MichaelEischer · January 14, 2021, 10:04pm

restic should only keep a handful temporary pack files (number of CPU cores * 2 might be a good guess). Each of those should on average have size of 4-5 MB, although a few pack files might be larger than that. What is the size of the largest pack file stored in the data/ folder of the repository?

You could run ls -la /proc/<pid-of-restic>/fd to see which temporary pack files are currently in use. Is there a single folder containing hundreds of thousands of files? Or more than 1 TB data directly in a single folder?

Silvenga · January 15, 2021, 1:15am

Thanks for confirming that @MichaelEischer. I’ve never seen Restic use more then what you said.

The largest pack in the repo is 12MB.

I have 16 logical CPU’s allocated to this VM, so 32 packs at 12MB is about 400MB in temp. I have two Restic jobs running concurrently on this VM right now. So maybe, worst case, 800MB in tmpfs…

scan finished in 262.141s: 46291 files, 8.625 TiB (bulk of data)
scan finished in 6.968s: 7 files, 572.760 GiB (windows backups, this one is normally multiple TB's, but I recently truncated it, very few files)
scan finished in 12.049s: 1286 files, 858.785 MiB
scan finished in 4.476s: 423 files, 3.231 GiB
scan finished in 826.491s: 1382232 files, 75.023 GiB (the backup with the most files, a web scraping project I've been working on)

My cluster reboot last night for automatic updates, so the backup job restarted. It hasn’t crashed yet, so that’s promising. It seemed to crash every couple of hours before the reboot - with the same out-of-disk error on tmpfs. Kernel bug… that would be odd…

Jan 13 05:00:02 sg1 systemd[1]: Started Backup CephFS Data.
Jan 13 05:00:02 sg1 bash[18349]: restic 0.10.0 compiled with go1.15.2 on linux/amd64
Jan 13 05:00:03 sg1 bash[18349]: Starting backup run...
Jan 13 05:00:03 sg1 bash[18349]: open repository
Jan 13 05:00:04 sg1 bash[18349]: lock repository
Jan 13 05:00:04 sg1 bash[18349]: load index files
Jan 13 05:00:13 sg1 bash[18349]: start scan on [/mnt/cephfs]
Jan 13 05:00:13 sg1 bash[18349]: start backup on [/mnt/cephfs]
Jan 13 05:04:26 sg1 bash[18349]: scan finished in 262.141s: 46291 files, 8.625 TiB

And my tmpfs usage is normal, but I’ve never seen it not normal (might be something I should add a check on).

df -h /tmp
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           936M   62M  875M   7% /tmp

I’ll continue to monitor it, I didn’t think it would Restic, but just wanted to check my assumptions.

Thanks, and cheers!