What’s the limitation on the number of concurrent requests? I understand there are rate limits, but this seems to be a concurrency limit that I haven’t found any information yet.
I’m working through this issue using the TTS endpoint. If I send 3 parallel requests its fine, 15 is a problem. I can’t find any documentation but chatgpt said this…
“While OpenAI doesn’t publicly list exact numbers for parallel request limits, here are some general strategies to work within typical constraints:” then refers back to rate limits. It appears the concurrent limit is dynamic but seems to be between 5-10. Likely need a combination of a queuing system and some sort of exponential backoff.
Take a look at the topic linked below.
It may be a good explanation as to why a increasing number of concurrent requests triggers the rate limit warning.