Hi OpenAI Dev Community,
We’ve been seeing some api_timeout issues when calling GPT-5 models from our API, especially when there are multiple requests happening at the same time. I wanted to check if anyone has run into something similar or has suggestions on how to handle this.
Use case:
Input: call transcript (text)
Task: determine the call type.
Model: mainly GPT-5-mini (sometimes GPT-5)
These calls are triggered from our API, so when several requests come in at once, we end up making multiple concurrent calls to OpenAI.
Issue
When concurrency increases, some requests hit api_timeout and the jobs on our side can end up stuck or incomplete. We’re trying to reduce false failures caused by temporary latency or retryable OpenAI issues while still supporting concurrent API traffic.
Questions
-
Are there recommended limits or best practices for handling concurrent GPT-5 requests from an API?
-
What’s the recommended approach for timeouts and retries to handle temporary latency?
-
For a simple classification task like this, is there a better model or prompt strategy to keep responses faster and more stable?
Any guidance would be really helpful. Happy to share more details if needed.
Thanks,
Ann