Unexpected latency with ChatGPT APIs in production


We currently have 3 OpenAI organizations for DEV, PREPROD and PROD environments. We noticed unusual latency on the PROD environment since last October 13.

We initially thought it was related to our implementation but after isolated tests, it seems the issue comes from your service. I attached a table showing the completion time for 15 different prompts.

We have this page about production best practices but in our case, the environments are identical: OpenAI Platform.

Could it be related to the number of users and concurrent access? I would be surprised knowing that our user base is quite reasonable.

It’s not your environments, it is your organization.

OpenAI found a bunch of old mining GPU machines on eBay, blew them out with a leaf blower, and set them up in the basement of the old bubble gum factory to run turbo models finally crushed enough to fit in 16GB. And then assigned them to those who have paid the least money upfront. Or so you would think.

The actual answer lies in your organization rate limit page. You will see a new tier system that your account has been assigned to. Especially for accounts with prepay, if you haven’t purchased $50 of credits, you’ll have been throttled, moved to slow servers, or whatever mechanism it is that they are using to provide lower satisfaction. So pay for credits.

If you can de-convolute the tangle of invited users from the strange organization system that you might be trying to use to limit internal rights or spending, you might try to consolidate so that you can overall be seen as a big spender, at least on that account needed for facing customers.

1 Like

Thank you for the help @_j.

I checked the tier system and it does not seem to explain the situation.

  • DEV is Tier 3
  • PREPROD is Free tier
  • PROD is Tier 1

In the logs we have the PREPROD environment as fast as the DEV environment.

Does it mean Free tier is as performant as Tier 3? The docs don’t seem to say that: OpenAI Platform

Here are the numbers for the last 12 days for that same test. 15 completions in parallel on the 3 environments. The y axis is the average completion time in ms.


Why would they slow “free trial” when they are trying to convert you to paid?

BTW, excellent graph. Exposes exactly what they are pulling on API users.


OK, thanks a lot. Fortunately, we should easily figure it out by adding more credits. :money_mouth_face::money_mouth_face: I will come back later with some additional figures.

I can confirm it works like a charm. No pay no gain… :sweat_smile:

1 Like