Disclosure first: I run a managed rest-server service, so “autonomous tenants I don’t control, HTTPS only, at scale” is basically my day job. No pitch, you’re building the same thing in-house, so here’s what actually works.
gurkan is right and it’s worth underlining: parallel backup is fine, those locks are non-exclusive. Only prune/forget/check take an exclusive lock. quietlotte is pointing at the real answer too: your management unit is the 50 institutes, not the 30,000 hosts. You should never need to know a hostname.
The mechanism is rest-server’s --private-repos. One htpasswd account per institute, and --private-repos confines each account to its own /<institute>/ path. Inside that namespace the institute runs restic init for as many repos as it wants, named however it wants, with zero involvement from you. So “30,000 unknown hosts I can’t reach” collapses to “50 accounts I create once.”
Do not do one shared repo per institute with tags. It looks simpler but has three traps: every host shares one encryption key so they can all read each other’s data, one host running prune locks out every other host, and one buggy or compromised host can damage the shared repo for everyone. Instead each host inits its own repo under the institute path, e.g. rest:``https://physics:pass@server/physics/``<hostname>. Now each host has its own key, its own lock, and prunes only its own repo, so the exclusive-lock problem you were worried about disappears entirely.
Two layers do two different jobs here, which is what makes it safe: the per-institute htpasswd credential gates HTTP access to that namespace, and the per-repo restic password (set by each host at init, which you never see) gates decryption. So even though hosts in one institute share a transport credential, host A still cannot decrypt host B’s repo. Access control is per-institute, confidentiality is per-host.
You also do not need minio, seaweedfs, or S3, and your instinct to avoid software that might lose support is sound. rest-server on your existing filesystem is enough. Your “provisioning API” is then about ten lines: to add an institute you append one htpasswd -B entry, make the directory, set a quota. It runs 50 times, not 30,000.
Three operational notes. Quotas: rest-server enforces none, so set a per-institute filesystem quota (XFS project quotas, or a ZFS dataset per institute) or one institute fills the disk for all 50. append-only: --append-only stops a compromised host from deleting history, but it also takes prune/forget away from the clients who currently run it themselves, so weigh that workflow change against your threat model rather than flipping it blindly. Proxy body size: if you terminate TLS with nginx in front of rest-server, set client_max_body_size 0 or large packs silently 413 and backups die after looking fine; if rest-server does its own --tls, ignore it.
Last thing, unsolicited: at 30,000 repos you cannot eyeball health, and a write landing is not a restore working. Whatever you build, script a periodic end-to-end restore of a canary and alert on it. That is the one check that tells you the service is doing its job.