Gpt-3.5 concurrent requests limit

I’m trying to send concurrent requests to OpenAI’s API using Python with the asyncio library.

It all goes well except if I send more than 10 requests at the same time, the gpt-3.5-turbo API endpoint starts throttling.

Here’s some data I have - all tests are run with 50 semaphores.

Number of concurrent requests | Avg. response time
5 | 5.3 sec
10 | 6.03 sec
15 | 9.71 sec
20 | 35.28 sec
25 | 35.19 sec
50 | 26.56 sec

What’s the limitation on the number of concurrent requests? I understand there are rate limits, but this seems to be a concurrency limit that I haven’t found any information yet.

2 Likes

Have you found any solution to this yet? Thanks.