Concurrency Rate Limiting: A $10,000 Issue

ericlaycock44 · August 21, 2024, 1:24pm

Again, in essence, the problem is really simple - the total network latency seems to scale with the number of concurrent requests, and not in a way that corresponds with our rate limits.

I made a dummy series of requests using 62 prompt input and resulting in 62 output tokens. I duplicated this request to run in parallel (asynchronously / concurrently) 5 times, 50 times, and 100 times.

Again, recall that every single request is identical to every other one. Because it’s being run in parallel, we would expect aside from some minor millisecond differences due to network congestion (neglible on our AWS server) that they should all have roughly the same network latency.

Instead:

5 concurrent requests → 2.32 seconds (avg)
50 concurrent requests → 4.90 seconds (avg)
100 concurrent requests → 9.22 seconds (avg)

Manually checking the 100-concurrent-requests data, we find that we don’t come anywhere close to exhausting our rate limit.

This is a significant bug for enterprise-level scaling.

Topic		Replies	Views
OpenAI Why Are The API Calls So Slow? When will it be fixed? API	103	54792	February 19, 2024
Error: 429 Too Many Requests API	56	14146	December 2, 2023
Gpt-3.5-turbo-1106 is very slow API chatgpt	46	7848	December 19, 2023
We proved the API is intentionally slow API	56	17866	May 2, 2023
504 Gateway Timeout But Response Logged Bugs	6	149	May 9, 2025

Concurrency Rate Limiting: A $10,000 Issue

Related topics