Gpt-3.5 concurrent requests limit

I’m trying to send concurrent requests to OpenAI’s API using Python with the asyncio library.

It all goes well except if I send more than 10 requests at the same time, the gpt-3.5-turbo API endpoint starts throttling.

Here’s some data I have - all tests are run with 50 semaphores.

Number of concurrent requests | Avg. response time
5 | 5.3 sec
10 | 6.03 sec
15 | 9.71 sec
20 | 35.28 sec
25 | 35.19 sec
50 | 26.56 sec

What’s the limitation on the number of concurrent requests? I understand there are rate limits, but this seems to be a concurrency limit that I haven’t found any information yet.

2 Likes

Have you found any solution to this yet? Thanks.

I’m working through this issue using the TTS endpoint. If I send 3 parallel requests its fine, 15 is a problem. I can’t find any documentation but chatgpt said this…

“While OpenAI doesn’t publicly list exact numbers for parallel request limits, here are some general strategies to work within typical constraints:” then refers back to rate limits. It appears the concurrent limit is dynamic but seems to be between 5-10. Likely need a combination of a queuing system and some sort of exponential backoff.

Take a look at the topic linked below.
It may be a good explanation as to why a increasing number of concurrent requests triggers the rate limit warning.

1 Like