Hmm… I recently made a 10TB exFAT volume on MacOS, and a repo for use with Windows and Mac both. I decided to run a prune job on it and just received this:
repository 6a4c65bd opened successfully, password is correct
loading indexes…
loading all snapshots…
finding data that is still in use for 227 snapshots
[7:47] 100.00% 227 / 227 snapshots
searching used packs…
collecting packs for deletion and repacking
List(data) returned error, retrying after 552.330144ms: fcntl /Users/smitmark/tmp/7/Backup/.restic-ohsu/data/2e: too many open files
List(data) returned error, retrying after 1.080381816s: fcntl /Users/smitmark/tmp/7/Backup/.restic-ohsu/data/2e: too many open files
List(data) returned error, retrying after 1.31013006s: fcntl /Users/smitmark/tmp/7/Backup/.restic-ohsu/data: too many open files
List(data) returned error, retrying after 1.582392691s: fcntl /Users/smitmark/tmp/7/Backup/.restic-ohsu/data: too many open files
List(data) returned error, retrying after 2.340488664s: fcntl /Users/smitmark/tmp/7/Backup/.restic-ohsu/data: too many open files
List(data) returned error, retrying after 4.506218855s: fcntl /Users/smitmark/tmp/7/Backup/.restic-ohsu/data: too many open files
List(data) returned error, retrying after 3.221479586s: fcntl /Users/smitmark/tmp/7/Backup/.restic-ohsu/data: too many open files
List(data) returned error, retrying after 5.608623477s: fcntl /Users/smitmark/tmp/7/Backup/.restic-ohsu/data: too many open files
List(data) returned error, retrying after 7.649837917s: fcntl /Users/smitmark/tmp/7/Backup/.restic-ohsu/data: too many open files
List(data) returned error, retrying after 15.394871241s: fcntl /Users/smitmark/tmp/7/Backup/.restic-ohsu/data/2e: too many open files
[5:26] 95.30% 569793 / 597881 packs processed
fcntl /Users/smitmark/tmp/7/Backup/.restic-ohsu/data: too many open files
smitmark@RJHB595 ~ %
Wondering if I needed to use a larger file allocation unit (I used 128k), or if this is a general exFAT issue, or what? But it terminated before it could complete and now I’m unsure what to do.
A simple fix for the “too many files open” limitation of Mac OS is to use the "ulimit - n" command. Curiously, the value of n appears to be critical to whether or not this command is accepted by MacOS.
I’ve found that ulimit -n 10240 (the default is 256) works but n values higher do not. 10240 files is probably more that enough for most users.
. . .
Adding the “ulimit -n 10240” statement to your bash profile using sudo nano .bash_profile makes it permanent.
Exactly, it’s a matter of the limitations your shell runs under at the time, it’s outside of restic and not something restic does. Here’s an article on the same subject: How to fix 'Too Many Open Files' in Linux
Did it work for you if you increased it to a higher value?
Given the output you showed it doesn’t look like prune did any writing, it was still just trying to collect information about your repository. So unless there was more to the output than what you showed above, I wouldn’t worry and wouldn’t feel a need to rebuild the index.
Cool, yeah it did just say it was collecting packs for deletion and repacking - but I wasn’t certain if that was collecting pacts for deletion, AND repacking as it goes haha. Figured a check wouldn’t hurt regardless. I’ll try a prune after and report back!
Also, would there be any reason that exFAT would specifically do this over HFS+ or APFS? I had just cloned this database over from an HFS+ volume and it worked perfectly fine. I’ve never seen this error before. I don’t typically put restic databases on exFAT, cause it’s so darn slow with many small files. But this is a 3TB database and I really wanted Windows / MacOS interoperability. I was wondering if the overhead from 128k allocation units was the culprit (it’s a 10TB volume), or if it’s really just how many files are in a directory and it’s file system independent and just a coincidence that I finally hit that limit.
In any case there is no need to run rebuild-index to ensure that prune doesn’t do any harm: If there is anything missing in the index, prune will complain and abort without changing the repo.
On the other hand, if the index is fine, rebuild-index is very fast and doesn’t do any harm
@rawtaz It worked like a charm! I’m still very curious if this is an exFAT issue or not… going to clone my repo to an APFS volume and undo the ulimit change and see what it does out of curiosity. Thanks for helping me work through this “out loud” haha.
@alexweiss Hmm in the past, if prune messed up or couldn’t continue, I’ve often had to run rebuild-index - but I think that’s because I was using pCloud as a backend, and it didn’t discard partially uploaded files (and restic would think they were full files).
So hypothetical situation… my computer loses power in the middle of a prune operation - is there any point in the prune process that a rebuild-index would be necessary to recover?
Hitting the limit of open file descriptors sounds a bit like there’s a bug somewhere. Restic should only use a relatively low two-digit number of file descriptors at a time. Which restic version are you using?
That depends a bit on the restic version. In principle it shouldn’t be necessary, although there are a few corner cases which are not handled properly yet. The next restic release will take care of a few of those (by using atomic cache and local backend writes). But in general, unless prune complains there’s no need to run rebuild-index.
If you can reproduce the problem please either use lsof -p <pid of restic> or the macOS activity monitor (select restic, open the process information and switch to “open files and ports”). Which files are reported there?
Hmm I couldn’t reproduce it with --dry-run. Unfortunately I’ve already ran prune after adding ulimit -n 10240 to my .zshrc profile. So technically it’s not really the same conditions, even without that command. I have added a large ~400GB snapshot by rcloning it, which introduced 199GB of dupes that --dry-run said it will remove.
Going to try it without --dry-run and see what happens.
to repack: 872718 blobs / 412.713 GiB
this removes 358561 blobs / 198.619 GiB
to delete: 0 blobs / 1.327 GiB
total prune: 358561 blobs / 199.946 GiB
remaining: 8319142 blobs / 3.250 TiB
unused size after prune: 0 B (0.00% of remaining size
Yeah, I swear this has something to do with this being an exFAT volume. I’ve copied 24TB with RapidCopy before with no issues. Lots of small files, restic databases, etc.
Going to try to move it to another volume that’s HFS+ instead and see what it does.
So there’s apparently a bug with AppleDouble “._*” files. You can’t always access them properly from an exFAT volume. Restic, Rclone, Rsync, RapidCopy, and even CP were all failing. After deleting them, both rsync and RapidCopy functioned properly.
In addition, exFAT folder enumeration on a Mac gets EXTREMELY SLOW once you get several thousand files in a folder. It was taking a couple minutes to copy each blob. After removing the AppleDouble files, which also took forever, the drive began functioning much more normally. Of course, they’ll be regenerated over time, so I have a RapidCopy sync going while excluding “._*”
Some of the errors I had gotten had me thinking the drives were failing, but no… SMART checks out and it’s behaving normally now that each folder has about half of the files it used to have in it.
I nearly have the whole thing copied over to a HFS+ disk instead. Lesson learned!
@akrabu Can you please mark the answer you think is the most applicable one as the solution to this thread? I’m not sure if it’s Michael’s or your last one :3
Haha I’m not sure either! I haven’t tried the PR yet and kind of went my own way with it. I did have two corrupted blobs after the whole thing was said and done, so I re-backed up and it added the blobs and everything appears to be fine now.
Buuuut I think I also did rsync /data to /data/data accidentally and that was the initial issue. So I’ll mark his as the solution. But it’s also good to note that exFAT may not play nicely with large Restic repos and AppleDouble files on a Mac!