3 times as long prune times since 0.13

Hi.

We’ve observed an issue with prune run times since version 0.13 . When we upgraded from 0.12.1 to 0.13.2, one system with somewhat of a special constellation seen an increase in prune time from 20 minutes to about 1 hour 15 minutes.

Back then, we didn’t analyze or report the issue. We’re now upgrading to 0.16, and i hoped that 0.16 will solve the issue. However, we’re still seeing this 3+ times increase in prune runtime, where we’re seeing a decrease on a lot of other systems. I also confirmed the behavior with 0.15.2 , which makes me think we have a bit of a snowflake case here.

The repository is shared between two hosts. Each host saves snapshots with an unique tag and only operates on his own tag for pruning (that’s a scenario we have in place a couple dozen times).

There are ~630 snapshots total, we create one every two hours and have a 30 day retention. The runs are spaced out for even / uneven hours between the two systems and the runtime (backup / forget --prune) generally is around 25 - 30 minutes, using restic 0.12.1 .

Full forget command:

  forget \
  --group-by host \
  --tag "${HOSTNAME}" \
  --keep-daily "${KEEP_DAILY}" \
  --keep-weekly "${KEEP_WEEKLY}" \
  --keep-within "${KEEP_WITHIN}" \
  --keep-monthly "${KEEP_MONTHLY}" \
  --keep-yearly "${KEEP_YEARLY}" \
  --prune &>>"${PRUNE_OUTPUT}"

Due to the nature of the frequent snapshots, data added in each run is pretty minuscule.

2023-08-24T11:30:03+02:00 Starting backup
using parent snapshot cf4aea42

Files:          72 new,    33 changed, 2248357 unmodified
Dirs:           12 new,    63 changed, 368115 unmodified
Added to the repo: 620.980 MiB

processed 2248462 files, 136.119 GiB in 5:15
snapshot 149db3bc saved

...

collecting packs for deletion and repacking
[0:02] 100.00%  66565 / 66565 packs processed


to repack:        34373 blobs / 618.080 MiB
this removes         49 blobs / 567.438 MiB
to delete:           13 blobs / 11.961 MiB
total prune:         62 blobs / 579.399 MiB
remaining:      2644562 blobs / 421.186 GiB
unused size after prune: 20.644 GiB (4.90% of remaining size)

Besides the amount of snapshots, the system isn’t too unusual after all.

I’m happy to provide more information if needed.

Let us know how it goes with 0.16.0 :slight_smile:

Well, it didn’t change, that’s why i came here :slight_smile:

We’re now upgrading to 0.16, and i hoped that 0.16 will solve the issue. However, we’re still seeing this 3+ times increase in prune runtime

Every version since 0.13 shows this negative impact on prune run time.

How often are you running forget / prune commands? How did you choose that frequency? For backups every hour you could prune once a week or once a month, unless you’re trying to reduce disk space, then you could prune a bit more often - I’d probably not go with more than every few days personally.

Why do you share repositories between servers at all? It seems like an odd way to do things. One repository per server seems simpler. I assume you have a good reason for it.

I run forget / prune like this on Windows, I haven’t seen the syntax you’re using, which isn’t to say it’s not correct.

restic.exe --repo c:\repo forget --keep-daily 7 --keep-weekly 8 --keep-monthly 24
restic.exe --repo c:\repo prune

forget --prune is run after every backup run, the frequency is a contractual requirement. Our pruning logic is documented at the start of this docs page: Removing backup snapshots — restic 0.16.0 documentation

Having a combined repository is not weird at all, the tags carry that concept a long way. Backing up — restic 0.16.0 documentation

Please mind that everything is working exceptionally well with restic 0.12.1, and only in newer releases we start to see issues. My suspicion is because of the amounts of snapshots, as this is the only thing that vastly differs from the hundreds of other repositories we manage.

The higher number would explain why newer restic versions get faster, but not why things would slow down.

Which backend is used to store the repository? Did you compare how long the individual steps take with restic 0.12.1 and some later version? It would be helpful to know which command/step is affected by the slowdown.

Thanks for having a look Michael.
Backend is a local (e.g. same data center) SFTP backend.

From what i can tell from the logs, it´s the prune step. First and last entry are 0.12.1, all the others 0.16.0 .

1 snapshots have been removed, running prune
loading indexes...
loading all snapshots...
finding data that is still in use for 638 snapshots
[20:48] 100.00%  638 / 638 snapshots
--
1 snapshots have been removed, running prune
loading indexes...
loading all snapshots...
finding data that is still in use for 638 snapshots
[1:14:43] 100.00%  638 / 638 snapshots
--
1 snapshots have been removed, running prune
loading indexes...
loading all snapshots...
finding data that is still in use for 638 snapshots
[1:15:08] 100.00%  638 / 638 snapshots
--
1 snapshots have been removed, running prune
loading indexes...
loading all snapshots...
finding data that is still in use for 638 snapshots
[1:15:28] 100.00%  638 / 638 snapshots
--
1 snapshots have been removed, running prune
loading indexes...
loading all snapshots...
finding data that is still in use for 638 snapshots
[1:15:48] 100.00%  638 / 638 snapshots
--
1 snapshots have been removed, running prune
loading indexes...
loading all snapshots...
finding data that is still in use for 638 snapshots
[21:38] 100.00%  638 / 638 snapshots

Hmm, that is pretty odd. The step that searches for data that’s still in use should be able to perform most work just based on cached data. restic 0.13.0 has changed a lot on the file upload path, but not on the download path, which would be relevant here. Saving downloaded files in the cache also has changed, but that shouldn’t be able to cause such a significant change in performance.

The only change between restic 0.12.1 and 0.13.1 that looks like it could cause the performance impact, revolves around how large directory metadata objects are handled. To reduce the memory usage, only a single tree blob with more than 50MB is processed at a time.

Could you run restic stats --mode debug with restic 0.16.0 and provide the printed file size statistics? That would provide enough information to know whether that change is the culprit.

1 Like

Here’s the full output of the stats command:

repository f94f1f05 opened (version 1)
Collecting size statistics

File Type: key
Count: 1
Total Size: 458 B
Size            Count
---------------------
100 - 999 Byte  1
---------------------

File Type: lock
Count: 1
Total Size: 155 B
Size            Count
---------------------
100 - 999 Byte  1
---------------------

File Type: index
Count: 53
Total Size: 308.828 MiB
Size                    Count
-----------------------------
1000000 - 9999999 Byte  53
-----------------------------

File Type: data
Count: 67762
Total Size: 438.288 GiB
Size                      Count
-------------------------------
          100 - 999 Byte  5
        1000 - 9999 Byte  17
      10000 - 99999 Byte  32
    100000 - 999999 Byte  569
  1000000 - 9999999 Byte  65612
10000000 - 99999999 Byte  1331
-------------------------------
Oversized: [564690524 564975501 563810565 563898837 562677373 565224347 562560409 564766734 563558562 563955415 565608068 562475891 564414368 563807074 563016067 563773908 563500637 563668153 564122409 564087555 563301711 565424739 563421004 565628762 563811980 564671259 563670363 562898991 565423258 563442693 566786930 563078189 564344298 562714259 564070249 563686227 564618018 564263745 564867647 565115717 563768550 563779545 564687113 564452074 563244190 563543132 564288421 563851194 562258835 565163002 564662803 564495956 564762917 563380060 562470610 563295128 562980120 564901375 562691241 565672627 563667114 565244358 564147477 565113258 563676375 563184271 565387250 565047464 563791972 563405195 565327964 565370363 564745444 563142072 562597913 565580431 565392032 563520891 562615819 562792330 564781833 563808512 563815302 562485857 564324474 564881264 563148636 562699263 565318968 563822191 562711925 563673909 565467452 563300578 563306512 562780674 564059475 570955438 562347922 564129119 563800172 575954935 564127238 564642301 563421400 563560354 563583332 564803400 563772977 564284465 565221328 563310204 562396759 565114263 565419361 564616887 565056529 563797124 563387024 563803343 562823882 563600662 565542948 563793713 565208979 565088553 564788995 567050257 564342049 563889432 562584474 565342334 564764860 564222445 563063657 563411222 565264175 563068121 563268423 565526403 563769405 563070076 563094883 563964818 562285319 564963836 564729224 562342535 563201207 564632881 565475487 563324849 562991113 564482417 564346490 563556303 562570957 562906510 562722384 563958775 565526405 562774101 564542240 565721468 564134613 565272840 563299006 563787514 565571710 563531512 563067064 562663601 564027758 563646403 564080384 563776900 563250275 565201070 563272123 562576496 565526488 562575622 564123113 565836223 564491734 562935836 562475563 563794742 564415495 565144820 565592792 565199763 564122782 562564164 563423358 564323819]

Blob Type: data
Count: 2268704
Total Size: 326.426 GiB
Size                    Count
-------------------------------
          10 - 99 Byte  11138
        100 - 999 Byte  355579
      1000 - 9999 Byte  90612
    10000 - 99999 Byte  1488671
  100000 - 999999 Byte  207221
1000000 - 9999999 Byte  115483
-------------------------------


Blob Type: tree
Count: 401170
Total Size: 111.768 GiB
Size                      Count
--------------------------------
            10 - 99 Byte  1
          100 - 999 Byte  303122
        1000 - 9999 Byte  90717
      10000 - 99999 Byte  5228
    100000 - 999999 Byte  860
  1000000 - 9999999 Byte  821
10000000 - 16777216 Byte  1
--------------------------------
Oversized: [564731536 563003201 563766004 563441225 563962415 22220281 22222486 563062781 563985787 562868249 563431833 22132420 22094552 564282835 565108100 22109260 564126803 565446652 562944770 22290111 22282029 22171386 22173222 22172122 22177628 564973871 22105215 562494294 22287172 22214400 562126044 565056456 563415622 22196015 563619044 22121764 22157788 22204838 22258876 22210355 562934206 22179833 22240124 22170653 22279453 22258508 22123972 563426177 22204102 564274926 563423541 22101173 562476949 564394880 564883401 563043922 22088299 565221207 564595810 22252994 22280925 563768646 22127277 22155954 563436342 22289009 22116244 562330729 22225062 562696308 22245272 22127645 22282765 22103010 565276619 562309980 563201886 563434091 22087563 562601741 22131684 22153746 22198956 565248341 22235349 22169184 564280199 22203734 22211825 22118820 22292318 22268061 22111468 22260342 22256303 22189030 22101541 563418263 562708369 563721895 22136097 22200060 22219913 563684600 564867574 563307443 563203019 22281293 22265491 564726264 563427685 22284964 22286804 563775050 22178729 22232045 22230576 562345431 564754895 22165142 22233512 562257205 22284234 22283133 562943261 563202641 22090872 562442269 22163305 562523703 563182282 22242696 22284602 22137566 562714402 22181302 562409876 22170285 563413360 565219698 22241592 22123236 561929665 22220646 563762991 563066547 22283866 22125443 562614189 22176162 22128744 22272464 22201532 22147867 22277248 564123040 22231677 563201134 562345808 562950428 22113671 563767891 22096392 563385099 562369539 22130212 565308669 562178469 22176527 563414491 22197855 565034573 564262115 22168083 565221962 564282081 22103743 563306312 22086830 563428817 563199626 565005535 564126049 22146763 22290846 22120660 564955773 22193077 22269894 22211459 22105951 22173589 22156322 22234245 22243432 22158156 22107420 22267693 22134993 22208518 564026128 562475818 563836120 563913036 561878415 562987374 564375330 564727393 22279085 564812191 564729651 22160361 563960162 563964677 22089403 562295652 563062027 563440849 563016392 564890189 22285700 563432961 563416754 22130948 22169917 563437846 22125811 22228739 22226902 22092344 22205574 22251522 22233145 565140526 563199249 22126909 562908577 22107788 563443867 22221750 563417131 563308574 564363682 22084254 22162569 22174690 562473933 22245640 22212192 22223958 22266963 565047391 22196383 563293498 563957145 22129111 22133889 22277616 22152274 563312341 22144187 22230208 564230086 563761485 565067392 22146395 562215376 22104111 564730782 564880767 22166611 22087195 564217276 563064281 565157114 562205589 564889435 22266227 22273568 22141243 22239020 562942884 22180198 22128013 564086091 564121155 565063244 564109093 563094444 563370022 564879634 22193813 22172490 22225798 22094920 562214624 22084990 565264934 22264019 22115876 22223590 22209987 562685386 22123604 22258143 22171754 563431081 562562534 563064658 564152055 22202633 22112567 22197119 563963923 22254831 22216240 22126541 22117716 22125075 562713270 22179097 562460733 563420904 565059475 563403565 22095288 22278349 22270995 561988102 564690451 22216608 22247845 22207414 563160044 22122500 22104479 563140442 564277936 563409595 22247477 562470537 563270493 22218077 22227267 22290478 22222854 22116612 22109628 564640671 564400888 562242886 22148235 564303140 22134257 562635657 22226166 22116980 22218441 22098229 563438974 22148971 564401643 562340905 562945903 22091608 22217344 22107052 563938295 22135729 562343167 563540299 22139038 563262576 564342627 22156687 564797869 22149336 22133524 564121909 22099333 564176560 22186454 562947411 563435592 564835941 563244117 22237554 565184632 22114404 22153010 22191605 563340999 22175426 22259244 22288276 22275408 564916955 562798875 563309702 564513603 22276144 562344299 22244904 22280189 22165510 565317338 565206879 565222717 22259977 563897210 22124339 22089035 22214768 22137198 22212560 562663528 565327891 564064599 562584401 562396686]

The ‘oversized’ part catched my eye, and i checked the rest of the output from one of the former runs, thinking that maybe some heavy repacking happens. But it doesn’t seem so:

1 snapshots have been removed, running prune
loading indexes...
loading all snapshots...
finding data that is still in use for 638 snapshots
[1:15:08] 100.00%  638 / 638 snapshots

searching used packs...
collecting packs for deletion and repacking
[0:02] 100.00%  66363 / 66363 packs processed


to repack:         12767 blobs / 56.249 MiB
this removes:         19 blobs / 33.149 MiB
to delete:            41 blobs / 10.368 MiB
total prune:          60 blobs / 43.517 MiB
remaining:       2643007 blobs / 416.621 GiB
unused size after prune: 20.344 GiB (4.88% of remaining size)

repacking packs
[0:01] 100.00%  5 / 5 packs repacked

rebuilding index
[0:03] 100.00%  66357 / 66357 packs processed

deleting obsolete index files
[0:00] 100.00%  54 / 54 files deleted

removing 8 old packs
[0:00] 100.00%  8 / 8 files deleted

done

The oversized part actually confirms my previous speculation: there are quite a few tree blobs with over 500MB. Since restic 0.13.0 these are only decoded one after another to significantly reduce the memory consumption (like 5-10GB for that repository). This change affects the “finding data that is still in use” step. However, the downside is that this special constellation (So far I’ve only seen such large tree blobs on macOS) results in an increased overall prune runtime.

That is, the slowdown is unfortunately sort of expected for now and likely won’t be fixed until memory usage when backing up directories containing many empty files · Issue #2446 · restic/restic · GitHub and potentially Store large extended attributes as data blobs · Issue #3643 · restic/restic · GitHub have been implemented.

1 Like

I thought i’ provide some more feedback after the retention requirement for this particular setup changed in the meantime, which resulted in way less snapshots. When we reached ~75 and ~85 respectively for each host, i decided to try the 0.16 version again.

Even though the times are more reasonable than before, looking at the raw numbers “shocked” me a little. With the 300+ snapshots on each host and restic 0.12, prune times where around 25 minutes. The prune took about the same with 65 snapshots, even a minute extra.

Quite honestly, i think this makes newer restic versions unuseable in szenarios with a decent amount of snapshots, and going back to a version that’s soon 2 1/2 years plus in age tastes … salty.

What makes you say it’s about the number of snapshots? Reading what Michael wrote it’s more about large metadata, e.g. on the ones mentioned on macOS.

1 Like