Help in setting up automated systemd backup with very long prune

devster · December 10, 2020, 11:25pm

Hello,
I am currently using restic to backup my data offsite, on Google Drive (unlimited).
The main reason for using this backend is the “unlimited” bit.
Below are my stats and the commands I’m using to run restic:

$ restic {backup,prune,forget} \
    --repo rclone:drive:backup/home \
    --option rclone.args=serve restic \
    --stdio --b2-hard-delete \
    --drive-use-trash=false \
    --fast-list --tpslimit=2 --tpslimit-burst=4 \
    --drive-pacer-burst=50 \
    --drive-pacer-min-sleep=150ms \
    --drive-chunk-size=16M \
    --verbose
$ find /directory -type f | wc -l
102220
$ du -s -B GB /directory/
12358GB /directory

The restic commands are generated by the following crestic configuration file:

# Ansible managed
[global.environ]

[@drive]
repo = rclone:drive:backup/home
option = rclone.args='serve restic --stdio --b2-hard-delete --drive-use-trash=false --fast-list --tpslimit=4 --tpslimit-burst=6 --drive-pacer-burst=75 --drive-pacer-min-sleep=125ms --drive-chunk-size=32M'
verbose = ''

[media@.backup]
exclude-file = ~/restic.excludes
arguments = /directory
tag = media

[media@.forget]
group-by = host,tags
keep-weekly = 4
keep-monthly = 6
compact = ''
prune = ''

And below the current issue:

$ restic prune ...
[364:55:07] 53.32%  1282462 / 2405024 packs

With these times it’s pretty difficult to schedule, say, a weekly backup and prune.
So my question is, is there anything I can do to speed things up without complicating setup too much?
Thank you

fd0 · December 11, 2020, 8:34am

Which version of restic is that? The reason I’m asking is because the master branch (and the next release) contains a rewritten implementation of prune which is much faster and also much more configurable. You can find a pre-compiled binary here: https://beta.restic.net/?sort=time&order=desc

It should be much better with the default configuration already, but in your case it may make sense to play around with the new --max-unused flag (e.g. try --max-unused 50%) and the --max-repack-size. If you do, please report back!

devster · December 11, 2020, 11:02am

restic 0.10.0 compiled with go1.15.2 on linux/amd64

If 0.11.0 already includes the new prune method I can definitely test it.
I believe part of the problem is also Google Drive though, is there a strategy to cache more data on a different backend for example (e.g. B2) just for the purpose of the prune?

alexweiss · December 11, 2020, 12:00pm

The new prune is not in 0.11.0, but in the latest betas.

If you do not care too much about some extra used space, I’d recommend to use --max-unused unlimited - this will be very fast as it only needs to access data that is stored in the local cache.

Prune also always shows how much unused space still remains in the repository, so if this is too high, you can rerun it with another value of --max-unused. Note, that there is also a --dry-run which shows what would be done.

devster · December 11, 2020, 12:05pm

Thanks, I’ll test the beta and report back.

SamPaul · December 21, 2020, 8:06am

The new prune version works perfectly. Even on full disks. Thx!

devster · December 27, 2020, 12:23am

I am testing some suggested parameters, mainly --max-unused 50%. The command being run is as follows (also some tweaks to Google Drive parameters):

$ ~/restic_v0.11.0-187-g36c5d39c_linux_amd64 forget \
    --repo rclone:uni-drive:backup/home \
    --option rclone.args='serve restic --stdio --b2-hard-delete --drive-use-trash=false --fast-list --tpslimit=2 --tpslimit-burst=6 --drive-pacer-burst=100 --drive-pacer-min-sleep=125ms --drive-chunk-size=64M' \
    --verbose --group-by host,tags \
    --keep-weekly 4 --keep-monthly 6 \
    --compact --prune --max-unused 50%
repository 6dc44de6 opened successfully, password is correct
Applying Policy: keep 4 weekly, 6 monthly snapshots
snapshots for (host [sagittarius], tags [media]):
keep 7 snapshots:
ID        Time                 Host         Tags
--------------------------------------------------
923f253d  2020-01-25 01:09:40  sagittarius  media
765142d4  2020-02-20 21:12:23  sagittarius  media
fda3160e  2020-03-27 19:33:49  sagittarius  media
3aacdd91  2020-04-14 02:07:21  sagittarius  media
9a228d93  2020-11-10 00:17:22  sagittarius  media
d65a662b  2020-11-22 00:04:07  sagittarius  media                                                                                                                                                 e7451112  2020-12-27 00:51:23  sagittarius  media
--------------------------------------------------
7 snapshots

remove 3 snapshots:
ID        Time                 Host         Tags
--------------------------------------------------
278c5922  2020-04-10 02:12:57  sagittarius  media
6fa0ca30  2020-12-24 20:38:41  sagittarius  media
dfacd0fe  2020-12-26 21:44:58  sagittarius  media
--------------------------------------------------
3 snapshots

snapshots for (host [sagittarius], tags [pv]):
keep 1 snapshots:
ID        Time                 Host         Tags
--------------------------------------------------
ab02f27c  2020-11-07 00:50:32  sagittarius  pv
--------------------------------------------------
1 snapshots

snapshots for (host [sagittarius], tags [media, pv]):
keep 8 snapshots:
ID        Time                 Host         Tags
--------------------------------------------------
f3fc29cf  2020-06-26 23:38:50  sagittarius  media
                                            pv
ec7eb587  2020-07-27 01:54:14  sagittarius  media
                                            pv
ac4b83cb  2020-08-28 01:22:00  sagittarius  media
                                            pv
324fc240  2020-09-27 05:40:31  sagittarius  media
                                            pv
5b38facc  2020-10-09 01:21:20  sagittarius  media
                                            pv
e708aa55  2020-10-18 02:33:42  sagittarius  media
                                            pv
a9ffd458  2020-10-31 17:38:30  sagittarius  media
                                            pv
fd00e0da  2020-11-03 02:24:39  sagittarius  media
                                            pv
--------------------------------------------------
8 snapshots

[0:03] 100.00%  3 / 3 files deleted
3 snapshots have been removed, running prune
loading indexes...
loading all snapshots...
finding data that is still in use for 16 snapshots
[1:02] 100.00%  16 / 16 snapshots
searching used packs...
collecting packs for deletion and repacking
[17:37] 0.00%  0 / 2523071 packs processed

However it seems packs aren’t being processed at all, is this supposed to happen?

alexweiss · December 27, 2020, 5:03am

Which build version are you using? Can you give us the output of restic version?

devster · December 27, 2020, 12:19pm

I left the binary name intact:

~/restic_v0.11.0-187-g36c5d39c_linux_amd64 version
restic 0.11.0-dev (compiled manually) compiled with go1.15.6 on linux/amd64

but the process seems to have progressed (and finished) overnight:

searching used packs...
collecting packs for deletion and repacking
will remove pack 7ca9f1a5 as it is unused and not indexed
[56:59] 100.00%  2523071 / 2523071 packs processed

used:           9873817 blobs / 11.897 TiB
unused:           40295 blobs / 27.117 GiB
unreferenced:                   6.734 MiB
total:          9914112 blobs / 11.923 TiB
unused size: 0.22% of total size

to repack:            0 blobs / 0 B
this removes          0 blobs / 0 B
to delete:         9311 blobs / 7.377 GiB
total prune:       9311 blobs / 7.377 GiB
remaining:      9904801 blobs / 11.916 TiB
unused size after prune: 19.747 GiB (0.16% of remaining size)

totally used packs:    2511099
partly used packs:       10292
unused packs:             1680

to keep:      2521391 packs
to repack:          0 packs
to delete:       1680 packs
to delete:          1 unreferenced packs

deleting unreferenced packs
[0:00] 100.00%  1 / 1 files deleted
rebuilding index
rclone: 2020/12/27 02:03:09 ERROR : index/9934ce3d8d33b3dfcad01e2d3ecb7321f651bd7a3f1217f9f045eb0e29630715: Post request put error: googleapi: Error 403: Rate Limit Exceeded, rateLimitExceeded
Save(<index/9934ce3d8d>) returned error, retrying after 720.254544ms: server response unexpected: 500 Internal Server Error (500)
... several more errors ...
[3:45] 100.00%  2521391 / 2521391 packs processed
deleting obsolete index files
[25:02] 100.00%  1445 / 1445 files deleted
removing 1680 old packs
[28:12] 100.00%  1680 / 1680 files deleted
done

I didn’t time it but it ran in less than 12 hours compared to almost 700 of v0.10.0.
Anything I should look for in the output to further optimise?

alexweiss · December 27, 2020, 7:34pm

Ah… I thought your restic prune did abort.

This collecting packs for deletion and repacking message comes before a List command is sent to you storage backend to get the list of all pack files. I also realized that some backends do not send the results by parts but first collect all filenames and then send them in order to be processed. Processing here should be very fast as it is only very simple calculations.
So it seems that you were waiting for your backend’s List result here.

Also thanks for the report of prune for your pretty large repository (at least much larger than mine ).

From what I can see the time-consuming parts finished pretty quickly, so I would expect that this took much less than 12 hours…

As you could see, no packs have been marked for repacking (which would have been the very time-consuming part). So there is nothing for you to optimize if you are satisfied with the extra ~20GiB used space (which is 0.16% of your repo size). No need however, to give --max-unused 50% in your case, everything above the 0.16% would have given the same pruning result - I think you can as well work with the default (5%).

What I found irritating is that the List command and obviously the Delete commands of you storage backend took quite a lot of time.

Listing files was at the speed of ~750 files per second - quite low for just returning the file names stored in the backend. Deleting was about the speed of 1 file/s - even though this is already parallelized within restic.

So you might think about benchmarking this using direct access to your storage (instead of using restic) to see if this is a backend issue or whether something goes wrong within restic.

devster · January 4, 2021, 11:59pm

The backend is Google Drive, it’s not the best but it does have truly unlimited space for orgs.
How could I test it directly? Would rclone work or would I need to make direct API calls?

alexweiss · January 5, 2021, 7:12am

Sure - just create some 1000s of files and use rclone ls to list the files and rclone deletefile (best in a script) to delete the files…

devster · January 10, 2021, 12:50am

I’m not currently using a custom API key, just the default one, this might be one reason it’s slower than expected.
Here some output:

$ rclone ls uni-drive:anime | wc -l
503
$ time rclone ls uni-drive:anime
real	0m1.731s
user	0m0.221s
sys	0m0.056s

$ rclone ls uni-drive:music | wc -l
2156
$ time rclone ls uni-drive:music
real	0m57.045s
user	0m0.415s
sys	0m0.076s

$ time rclone ls uni-drive:music
real	0m4.793s
user	0m0.315s
sys	0m0.155s

The only outlier seems to be one listing for music which takes almost a minute.
It seems for some reason it hits some form of rate limit even for file listing.

As a sidenote, I found with the new version the output when running as a systemd unit, which during forget/prune operation, outputs a line for each second passed. I remember the behaviour being different before, with no “progress” output reported unless a HUP signal was given.

Jan 10 01:34:51 sagittarius crestic[29795]: [0:36] 0.00%  0 / 2552402 packs processed
Jan 10 01:34:52 sagittarius crestic[29795]: [0:37] 0.00%  0 / 2552402 packs processed
Jan 10 01:34:53 sagittarius crestic[29795]: [0:38] 0.00%  0 / 2552402 packs processed
Jan 10 01:34:54 sagittarius crestic[29795]: [0:39] 0.00%  0 / 2552402 packs processed
Jan 10 01:34:55 sagittarius crestic[29795]: [0:40] 0.00%  0 / 2552402 packs processed
Jan 10 01:34:56 sagittarius crestic[29795]: [0:41] 0.00%  0 / 2552402 packs processed
Jan 10 01:34:57 sagittarius crestic[29795]: [0:42] 0.00%  0 / 2552402 packs processed
Jan 10 01:34:58 sagittarius crestic[29795]: [0:43] 0.00%  0 / 2552402 packs processed
Jan 10 01:34:59 sagittarius crestic[29795]: [0:44] 0.00%  0 / 2552402 packs processed
Jan 10 01:35:00 sagittarius crestic[29795]: [0:45] 0.00%  0 / 2552402 packs processed
Jan 10 01:35:01 sagittarius crestic[29795]: [0:46] 0.00%  0 / 2552402 packs processed
Jan 10 01:35:02 sagittarius crestic[29795]: [0:47] 0.00%  0 / 2552402 packs processed

alexweiss · January 10, 2021, 3:02pm

If you can list 500 files in 1.7s and 2000 files in 4.8s, then listing 2.523.000 files in 3419s sounds pretty reasonable to me… So this ~60 minutes for collecting packs must be due to your backend. And yes, this is slow. I did I test myself for my cloud storage and get more than 5000 files/s when listing.

You could add a test for deleting files which also seemed pretty slow to me, but chances are good that the 1file/s deletion rate again is due to your could storage.

So if you don’t want to change the storage provider, there is actually nothing you can improve. Once restic supports larger pack file sizes, you can think about using e.g. 64MB or 128MB as minimum pack size.

see

github.com/restic/restic

Improve progress information for non-interactive usage

opened 12:42AM - 26 Dec 20 UTC

closed 09:53AM - 28 Jan 21 UTC

Syphdias

help: wanted state: need implementing

Output of `restic version` -------------------------- ``` $ /home/restic/bin/…restic version restic 0.11.0 (v0.11.0-215-g8efb874f) compiled with go1.15.6 on linux/amd64 ``` How did you run restic exactly? ------------------------------- I run restic backups, checks, forgets and prunes through systemd and I use templates for running it with different configurations. In this case: `bash -c '/home/restic/bin/restic $(</home/restic/restic.s3.check)'` will be executed by systemd which expands to `bash -c '/home/restic/bin/restic check --read-data -o s3.list-objects-v1=true'`. This leads to lot of stdout output which then "spams" the journal as you can see with `journalctl`. ``` ❯ systemctl cat restic-run-s3@.service # /usr/lib/systemd/system/restic-run-s3@.service [Unit] Description=Run restic job with parameters from /home/restic/restic.s3.%i OnFailure=status-mail@%n.service [Service] Type=simple Nice=10 IOSchedulingClass=best-effort IOSchedulingPriority=7 User=restic Environment="HOME=/home/restic" EnvironmentFile=/home/restic/restic.s3.env ExecStart=bash -c '/home/restic/bin/restic $(</home/restic/restic.s3.%i)' ❯ sudo systemctl start restic-run-s3@check.service && sudo journalctl -fu $_ -- Journal begins at Sun 2020-07-12 14:07:29 CEST. -- [...] Dec 26 01:15:47 machine systemd[1]: Started Run restic job with parameters from /home/restic/restic.s3.check. Dec 26 01:15:48 machine bash[77720]: using temporary cache in /tmp/restic-check-cache-618104479 Dec 26 01:15:49 machine bash[77720]: created new cache in /tmp/restic-check-cache-618104479 Dec 26 01:15:49 machine bash[77720]: create exclusive lock for repository Dec 26 01:15:49 machine bash[77720]: load indexes Dec 26 01:16:03 machine bash[77720]: check all packs Dec 26 01:16:07 machine bash[77720]: check snapshots, trees and blobs Dec 26 01:17:32 machine bash[77720]: read all data Dec 26 01:17:33 machine bash[77720]: [0:01] 0.00% 0 / 23628 packs Dec 26 01:17:34 machine bash[77720]: [0:02] 0.01% 2 / 23628 packs Dec 26 01:17:35 machine bash[77720]: [0:03] 0.02% 4 / 23628 packs Dec 26 01:17:36 machine bash[77720]: [0:04] 0.02% 5 / 23628 packs Dec 26 01:17:37 machine bash[77720]: [0:05] 0.03% 7 / 23628 packs Dec 26 01:17:38 machine bash[77720]: [0:06] 0.04% 9 / 23628 packs Dec 26 01:17:39 machine bash[77720]: [0:07] 0.04% 10 / 23628 packs Dec 26 01:17:40 machine bash[77720]: [0:08] 0.06% 14 / 23628 packs Dec 26 01:17:41 machine bash[77720]: [0:09] 0.06% 15 / 23628 packs Dec 26 01:17:42 machine bash[77720]: [0:10] 0.07% 16 / 23628 packs Dec 26 01:17:43 machine bash[77720]: [0:11] 0.08% 19 / 23628 packs Dec 26 01:17:44 machine bash[77720]: [0:12] 0.08% 20 / 23628 packs [...] ``` What backend/server/service did you use to store the repository? ---------------------------------------------------------------- S3 Note: I compiled it myself because I needed ListObjects v1. I could not test this, but I would except it to be unrelated to this flag. Expected behavior ----------------- The expected behavior would be to not have this output on stdout but on the tty so it is displayed in a interactive shell but not in the journal. I think the `restic backup` command does it like this or it registers the absence of a interactive session and behaves differently. Actual behavior --------------- restic is unaware of context and writes to much stdout logs (to the systemd journal). Steps to reproduce the behavior ------------------------------- Run `restic check --read-data` through a systemd service `restic-unit.service` and check the progress with `journalctl -fu restic-unit.service`. I think this can also reproduced by running: `restic check --read-data -o s3.list-objects-v1=true | cat` Do you have any idea what may have caused this? ----------------------------------------------- The best guesses are that `restic check` is unaware that the context it is running in is non-interactive, for example through a pipe, or is writing progress to stdout instead to tty (unlike other commands that display progress). Do you have an idea how to solve the issue? ------------------------------------------- If my guesses are correct, I would check the other progress display commands and adjust the one for `restic check` accordingly. Did restic help you today? Did it make you happy in any way? ------------------------------------------------------------ I went from manual backup to 5min snapshots for my home directory and it is the best!