Slow Chat api responses ------

anyone else experiencing much slower API responses for the chat models for identical calls ? (vs a week ago)

Edit: identical prompt on playground is as fast as it used to be,



Yes, I have apps running identical code to a week ago that are now around 5x as slow. API calls that used to take 3-5s now take 20-30s. I’ve tried playing with max_tokens and models but this slowdown is consistent. I am using python and very simple API calls. I am assuming this is due to increased usage of the API but would love to know if there is a better solution, or work being done to improve this. Apps which made sense with a fast response (e.g. chat based) are not usable with this slow response time.


now, 14s for req. If it only takes 3-5 seconds then I can get more benefits, so this is an important issue

I think many often people confuse network delays and data center congestion, etc with API performance.

For example, I am 12 time zones away from the US and call the OpenAI completion API, and here are the results when I time the call:


Test 1:  Completions.get_reply Time: 1.247792 secs
Test 2:  Completions.get_reply Time: 5.038783 secs
Test 3:  Completions.get_reply Time: 1.289555 secs
Test 4:  Completions.get_reply Time: 2.205132 secs

Kindly keep in mind that I am testing OpenAI APIs from the opposite side of the world than the US.

Also, if I repeat for other models, the results are similar. It’s mostly network traffic issues, not model issues, from my experience.

Having said that, lately I have noticed that text-davinci-002 is about 0.5 seconds faster than text-davinci-003 (for the same prompt), but did not test extensively.


Appendix: Example Test

nope, not confusing anything here, again, I see consistant results from api calls made from two servers, Western Europe and Central US, the service is degraded in both locations compared to a week ago,
your example uses davinci completions, we are experiencing these issues with chat models

1 Like

No problemo… can easily run tests for the chat completion:

Here are some test results (just now) for turbo:


Test 1, Completion API Time: 1.529 seconds
Test 2. Completion API Time: 2.504 seconds
Test 3. Completion API Time: 1.557 seconds
Test 4. Completion API Time: 1.513 seconds
Test 5. Completion API Time: 1.505 seconds

Appendix: Sample Chat Completion with Time



we are definitely not running a “Hello world” completion… those results are meaning less in the real world, we have a complex prompt, and again, we have seen latency go from 5-8s to above 40…
not sure what you are trying to prove :slight_smile:


Haha, @Gaouzief

I knew before I posted you were going to respond exactly as you did, so funny. But since. you did not include the number of tokens in your posts, I had no idea what to test :slight_smile: Maybe next time post the number of tokens you have having issues with so we can help you end not waste time running tests which are not relevant to your “I’m a big token kinda guy” issues :slight_smile:

I will run a 3000 total token coding completion and then we will see…

Hold on.



I was mistaken, before the numbers were much lower, now they are slow as you mentioned, for turbo

Model: gpt-3.5-turbo-0301

Test 1: Total Tokens: 2923, Completion API Time: 11.799 seconds
Test 2: Total Tokens: 2923, Completion API Time: 11.709 seconds
Test 3: Total Tokens: 2923, Completion API Time: 10.635 seconds

Appendix Sample Completion Turbo ~3000 Tokens:

and that’s nowhere near the complexity we have in our prompt, unfortunately I can’t share it.
anyway, we were ok with 5-8s, given the task
now we are. at 40-50s with some requests above 1mn, which obviously is not production level anymore… hopefully it’s a transient situation


Yeah, I understand @Gaouzief

You are a huge token, super complex kinda guy with secret prompts and messages others cannot test and confirm.



nope, I am consultant “kinda guy” who signed an NDA :slight_smile:

thanks anyway :wink:


FYI, I have worked as a consultant and software developer on countless software projects and normally when we have proprietary data, we create “test harness” versions of the data which is not sensitive nor under the NDA which we can provide vendors, support folks, etc. to test issues, etc.

This is fairly standard practice in the industry, just in case you did not know.


I have the same observations - the ChatGPT API has significantly slowed down today. In the env below, ALL my completions were usually in the 3-5 secs range (slow ie). But they are consistently timing out after ~30 secs today.

Model: gpt-3.5-turbo
servers: US East Coast


Starting from today, I’ve noticed that the chatgpt API requests are very slow. Even a simple request takes about 13 seconds before returning the first byte. It’s very strange.

Model: gpt-3.5-turbo
Servers: Canada Central

1 Like

For the folks that are suffering this issue: have you guys also noticed a degradation in the quality of the responses (better if you have specific kpis that prove it)? Or is it just latency?

any solutions about the extreme slow speed?

If it meets your use case, you can switch to another model, for example test-davinci-003 or even better, text-davinci-002, for faster response times.