My request are getting throttled back even though I’m not near my limits
My request are getting throttled back even though I’m not near my limits.
I’m at tier 4, so I have 2M/TPM and 10K/RPM.
I ran some tests where:
- Each request uses 7671 tokens
- Average request execution time is 6.71 seconds
- Using the gpt-04 API
I started off small running only 10 requests @ 5 concurrent requests.
The best performance was a batch of 60 requests @ 20 concurrent requests.
At that mark I was doing about 500K TPM.
After that, each batch was 200 requests. I found that after about 60 to 70 requests, thing slowed down significantly.
No matter how I configured it, I could not process more than 338K TPM.
It sure would be nice if open AI would give us some insight into this as I see other people experiencing similar problems.
Below is a table with the results.