I’m trying to send concurrent requests to OpenAI’s API using Python with the asyncio library.
It all goes well except if I send more than 10 requests at the same time, the gpt-3.5-turbo API endpoint starts throttling.
Here’s some data I have - all tests are run with 50 semaphores.
Number of concurrent requests | Avg. response time
5 | 5.3 sec
10 | 6.03 sec
15 | 9.71 sec
20 | 35.28 sec
25 | 35.19 sec
50 | 26.56 sec
What’s the limitation on the number of concurrent requests? I understand there are rate limits, but this seems to be a concurrency limit that I haven’t found any information yet.