Hello,
anyone else experiencing much slower API responses for the chat models for identical calls ? (vs a week ago)
Edit: identical prompt on playground is as fast as it used to be,
H
Hello,
anyone else experiencing much slower API responses for the chat models for identical calls ? (vs a week ago)
Edit: identical prompt on playground is as fast as it used to be,
H
Yes, I have apps running identical code to a week ago that are now around 5x as slow. API calls that used to take 3-5s now take 20-30s. I’ve tried playing with max_tokens and models but this slowdown is consistent. I am using python and very simple API calls. I am assuming this is due to increased usage of the API but would love to know if there is a better solution, or work being done to improve this. Apps which made sense with a fast response (e.g. chat based) are not usable with this slow response time.
now, 14s for req. If it only takes 3-5 seconds then I can get more benefits, so this is an important issue
I think many often people confuse network delays and data center congestion, etc with API performance.
For example, I am 12 time zones away from the US and call the OpenAI completion API, and here are the results when I time the call:
text-davinci-003
Test 1: Completions.get_reply Time: 1.247792 secs
Test 2: Completions.get_reply Time: 5.038783 secs
Test 3: Completions.get_reply Time: 1.289555 secs
Test 4: Completions.get_reply Time: 2.205132 secs
Kindly keep in mind that I am testing OpenAI APIs from the opposite side of the world than the US.
Also, if I repeat for other models, the results are similar. It’s mostly network traffic issues, not model issues, from my experience.
Having said that, lately I have noticed that text-davinci-002
is about 0.5 seconds faster than text-davinci-003
(for the same prompt), but did not test extensively.
nope, not confusing anything here, again, I see consistant results from api calls made from two servers, Western Europe and Central US, the service is degraded in both locations compared to a week ago,
your example uses davinci completions, we are experiencing these issues with chat models
No problemo… can easily run tests for the chat completion
:
Here are some test results (just now) for turbo
:
gpt-3.5-turbo-0301
Test 1, Completion API Time: 1.529 seconds
Test 2. Completion API Time: 2.504 seconds
Test 3. Completion API Time: 1.557 seconds
Test 4. Completion API Time: 1.513 seconds
Test 5. Completion API Time: 1.505 seconds
HTH
we are definitely not running a “Hello world” completion… those results are meaning less in the real world, we have a complex prompt, and again, we have seen latency go from 5-8s to above 40…
not sure what you are trying to prove
and that’s nowhere near the complexity we have in our prompt, unfortunately I can’t share it.
anyway, we were ok with 5-8s, given the task
now we are. at 40-50s with some requests above 1mn, which obviously is not production level anymore… hopefully it’s a transient situation
I have the same observations - the ChatGPT API has significantly slowed down today. In the env below, ALL my completions were usually in the 3-5 secs range (slow ie). But they are consistently timing out after ~30 secs today.
Model: gpt-3.5-turbo
servers: US East Coast
Starting from today, I’ve noticed that the chatgpt API requests are very slow. Even a simple request takes about 13 seconds before returning the first byte. It’s very strange.
Model: gpt-3.5-turbo
Servers: Canada Central
For the folks that are suffering this issue: have you guys also noticed a degradation in the quality of the responses (better if you have specific kpis that prove it)? Or is it just latency?
any solutions about the extreme slow speed?
If it meets your use case, you can switch to another model, for example test-davinci-003
or even better, text-davinci-002
, for faster response times.
HTH
any update on this ? we’re making chat completion api call openai.createChatCompletion, the same api call took 1s, 4s, 20s when called over and over.
Has anyone experimented with reducing the size of the prompts? I remember I read it somewhere said the API response time could be related to prompt length.
My limited experiment suggests it’s not related, but it’s not conclusive. I don’t know if I send in the same query with long prompt then with short prompt again, are the two response time comparable? Maybe the the second query reused something from the first query? My understanding is that the two queries are totally unrelated on from the API’s perspective. But I’m not sure.
So I have been using your suggestion to use da Vinci instead of chat but is there any update to improve the speed of chat? Thank you so much for your valuable time and assistance!