My request are getting throttled back

My request are getting throttled back even though I’m not near my limits

My request are getting throttled back even though I’m not near my limits.

I’m at tier 4, so I have 2M/TPM and 10K/RPM.

I ran some tests where:

  1. Each request uses 7671 tokens
  2. Average request execution time is 6.71 seconds
  3. Using the gpt-04 API

I started off small running only 10 requests @ 5 concurrent requests.

The best performance was a batch of 60 requests @ 20 concurrent requests.

At that mark I was doing about 500K TPM.

After that, each batch was 200 requests. I found that after about 60 to 70 requests, thing slowed down significantly.

No matter how I configured it, I could not process more than 338K TPM.

It sure would be nice if open AI would give us some insight into this as I see other people experiencing similar problems.

Below is a table with the results.

1 Like