Slow Chat api responses ------

Gaouzief · March 13, 2023, 4:49pm

Hello,
anyone else experiencing much slower API responses for the chat models for identical calls ? (vs a week ago)

Edit: identical prompt on playground is as fast as it used to be,

H

samstitt · March 14, 2023, 4:22am

Yes, I have apps running identical code to a week ago that are now around 5x as slow. API calls that used to take 3-5s now take 20-30s. I’ve tried playing with max_tokens and models but this slowdown is consistent. I am using python and very simple API calls. I am assuming this is due to increased usage of the API but would love to know if there is a better solution, or work being done to improve this. Apps which made sense with a fast response (e.g. chat based) are not usable with this slow response time.

nguyenanhdon.qn · March 14, 2023, 4:36am

now, 14s for req. If it only takes 3-5 seconds then I can get more benefits, so this is an important issue

ruby_coder · March 14, 2023, 9:40am

I think many often people confuse network delays and data center congestion, etc with API performance.

For example, I am 12 time zones away from the US and call the OpenAI completion API, and here are the results when I time the call:

`text-davinci-003`

Test 1:  Completions.get_reply Time: 1.247792 secs
Test 2:  Completions.get_reply Time: 5.038783 secs
Test 3:  Completions.get_reply Time: 1.289555 secs
Test 4:  Completions.get_reply Time: 2.205132 secs

Kindly keep in mind that I am testing OpenAI APIs from the opposite side of the world than the US.

Also, if I repeat for other models, the results are similar. It’s mostly network traffic issues, not model issues, from my experience.

Having said that, lately I have noticed that text-davinci-002 is about 0.5 seconds faster than text-davinci-003 (for the same prompt), but did not test extensively.

Appendix: Example Test

Gaouzief · March 14, 2023, 10:00am

nope, not confusing anything here, again, I see consistant results from api calls made from two servers, Western Europe and Central US, the service is degraded in both locations compared to a week ago,
your example uses davinci completions, we are experiencing these issues with chat models

ruby_coder · March 14, 2023, 10:30am

No problemo… can easily run tests for the chat completion:

Here are some test results (just now) for turbo:

`gpt-3.5-turbo-0301`

Test 1, Completion API Time: 1.529 seconds
Test 2. Completion API Time: 2.504 seconds
Test 3. Completion API Time: 1.557 seconds
Test 4. Completion API Time: 1.513 seconds
Test 5. Completion API Time: 1.505 seconds

Appendix: Sample Chat Completion with Time

HTH

Gaouzief · March 14, 2023, 10:32am

we are definitely not running a “Hello world” completion… those results are meaning less in the real world, we have a complex prompt, and again, we have seen latency go from 5-8s to above 40…
not sure what you are trying to prove

Gaouzief · March 14, 2023, 10:46am

and that’s nowhere near the complexity we have in our prompt, unfortunately I can’t share it.
anyway, we were ok with 5-8s, given the task
now we are. at 40-50s with some requests above 1mn, which obviously is not production level anymore… hopefully it’s a transient situation

sanjaymk · March 14, 2023, 4:13pm

I have the same observations - the ChatGPT API has significantly slowed down today. In the env below, ALL my completions were usually in the 3-5 secs range (slow ie). But they are consistently timing out after ~30 secs today.

Model: gpt-3.5-turbo
servers: US East Coast

rxliuli · March 14, 2023, 4:17pm

Starting from today, I’ve noticed that the chatgpt API requests are very slow. Even a simple request takes about 13 seconds before returning the first byte. It’s very strange.

Model: gpt-3.5-turbo
Servers: Canada Central

AgusPG · March 14, 2023, 4:19pm

For the folks that are suffering this issue: have you guys also noticed a degradation in the quality of the responses (better if you have specific kpis that prove it)? Or is it just latency?

gaoliqiang1990 · March 16, 2023, 4:20am

any solutions about the extreme slow speed?

ruby_coder · March 16, 2023, 4:24am

If it meets your use case, you can switch to another model, for example test-davinci-003 or even better, text-davinci-002, for faster response times.

HTH

moradian.ali · March 20, 2023, 6:58pm

any update on this ? we’re making chat completion api call openai.createChatCompletion, the same api call took 1s, 4s, 20s when called over and over.

piper · April 12, 2023, 10:54pm

Has anyone experimented with reducing the size of the prompts? I remember I read it somewhere said the API response time could be related to prompt length.

My limited experiment suggests it’s not related, but it’s not conclusive. I don’t know if I send in the same query with long prompt then with short prompt again, are the two response time comparable? Maybe the the second query reused something from the first query? My understanding is that the two queries are totally unrelated on from the API’s perspective. But I’m not sure.

BrianLovesAI · April 26, 2023, 9:19pm

I hope… I can solve this speed issue… one day…

yrasowsky · May 2, 2023, 2:56pm

So I have been using your suggestion to use da Vinci instead of chat but is there any update to improve the speed of chat? Thank you so much for your valuable time and assistance!

Topic		Replies	Views
API calls to davinci text 3 very slow and random speeds for identical prompts API	27	6911	December 25, 2023
Chat Completion API extremely slow and hanging API	7	4931	December 4, 2023
Very slow response time with chatgpt-3.5 turbo model API API	17	10977	December 19, 2023
Chat GPT's API is significantly slower than the website with GPT Plus API	35	36594	December 12, 2023
Extremely long request times- Completions API gpt-4o Bugs gpt-4o	10	450	December 5, 2024

Slow Chat api responses ------

text-davinci-003

Appendix: Example Test

gpt-3.5-turbo-0301

Appendix: Sample Chat Completion with Time

Related topics

`text-davinci-003`

`gpt-3.5-turbo-0301`