Hello,
I’m having TCP reset issues while restoring snapshots from a swift backend
# restic restore 5659b232 --target /data/restore
restoring <Snapshot 5659b232 of [/data/backup/instance/20230825_000015] at 2023-08-26 06:01:28.789539585 +0000 UTC by root@host> to /data/restore
...
ignoring error for /data/backup/instance/20230825_000015/20_incr_20230825_220023/drive_147693/FTS_0000000000354c56_00000000008a9f52_INDEX_2.ibd.meta: StreamPack: read tcp <restore-host-IP>:9056-><swift-proxy-IP>:443: read: connection reset by peer
...
ignoring error for /data/backup/instance/20230825_000015/20_incr_20230825_220023/drive_147693/FTS_0000000000354c56_00000000008a9f52_INDEX_2.ibd.meta: UtimesNano: no such file or directory
...
Summary: Restored 18857879 / 18858339 Files (543.684 GiB / 543.684 GiB) in 1:28:13
Fatal: There were 7303282 errors
I logged only one file here but many more are affected.
As expected from the errors, the file is missing from the filesystem, but it can be dumped without issues:
# restic dump 5659b232 /data/backup/instance/20230825_000015/20_incr_20230825_220023/drive_147693/FTS_0000000000354c56_00000000008a9f52_INDEX_2.ibd.meta
repository cce22958 opened (version 2, compression level auto)
page_size = 16384
zip_size = 0
space_id = 3492942
It’s also possible to restore with --include
# restic restore 5659b232 --target /data/restore --include /data/backup/instance/20230825_000015/20_incr_20230825_220023/drive_147693/FTS_0000000000354c56_00000000008a9f52_INDEX_2.ibd.meta
repository cce22958 opened (version 2, compression level auto)
restoring <Snapshot 5659b232 of [/data/backup/instance/20230825_000015] at 2023-08-26 06:01:28.789539585 +0000 UTC by root@host> to /data/restore
Summary: Restored 7 / 1 Files (51 B / 51 B) in 0:00
After which the file is available on the filesystem:
# cat /data/restore/data/backup/instance/20230825_000015/20_incr_20230825_220023/drive_147693/FTS_0000000000354c56_00000000008a9f52_INDEX_2.ibd.meta
page_size = 16384
zip_size = 0
space_id = 3492942
I ran a check just to make sure:
# restic check --read-data
using temporary cache in /tmp/restic-check-cache-1165536201
repository cce22958 opened (version 2, compression level auto)
created new cache in /tmp/restic-check-cache-1165536201
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
[4:34] 100.00% 5 / 5 snapshots
read all data
[46:23] 100.00% 6727 / 6727 packs
no errors were found
The issue is reproductible as it happened multiple times with this snapshot and also snapshots from other instances. However not all snapshots and not all instances are affected.
Obviously nothing here points towards a restic issue, and I’ve already reached out to my swift provider for investigation, but I would love to know if you think tunables like pack size or connection number could help, or if you have tips on how to perform a retry, as running another restore usually returns the same errors, and parsing stderr for missing files to restore isn’t convenient.
Version used :
restic 0.16.0 compiled with go1.20.6 on linux/amd64
Thanks a lot in advance for your help,
Have a nice day