Very slow response time with chatgpt-3.5 turbo model API

I need to wait for 40 seconds to get a response on gpt-3.5 chat model. Is this normal? The response is in Ukrainian. It used to be 5-10 seconds a few days ago.


Same issue in Germany. Just a “Hello” wait for 15s to response.

1 Like

It works, but in some cases. For me, when I use this model for the actual chat with question/answer, where the chatbot is actually asking questions, not answering the response time is very crucial

No it’s not normal, OpenAI needs to fix this up soon, why is it not the same speed as the Playground?

I think for developers, we only want OpenAI to explain officially the reason for huge difference between playground and using api. And maybe a solution for the issue.
Of course, we can use some tricks to avoid bad user experience, but it won’t be the best one.


I’m timing out after one or sometimes even two minutes. I’m still on my free “grant”. Does the API adjust priority down based on that?

There are two ways to receive a response from OpenAI:

  1. End-to-end: This generates all the words and sends the result as one response. However, this method may cause a timeout issue since most servers only wait for a response for up to 30 seconds.

  2. Stream mode: This sends each word to the user as soon as it is generated. This is what we refer to as the “first response time.”

My personal experience with the “first response time” was less than 3 seconds just two days ago. However, currently, it takes 15 seconds or more.

In request body add stream:true to use stream mode. Otherwise, it’s normal mode.

1 Like

same problem , from india , Davinci works perfectly fine but 3.5turbo takes too much time to respond

Does OpenAI have any comments on reducing speed?

The problem is worse on Mobile for me compared to PCs. It takes 90 secs on MS Edge and 150 secs on Chrome on PC/Mac. But it actually rarely spits out any results on Mobile.

Not sure if this is related to gpt-3.5-turbo. I’m in New Zealand.

Noticed this with other models too, like whisper-1. Both text-generation and speech-to-text from OpenAI has become drastically slower for me in recent weeks.

With regards to speech-to-text, the latencies with Whisper got so bad that I switched to an alternative (Deepgram) that was faster and cheaper. The results were equal, if not better than Whisper’s.

If anyone has alternative APIs to 3.5-turbo, I’d love to hear them!

Has anyone noticed that this is different user-to-user? We send a user id with each request (this used to be a requirement, but it seems they dropped it) and certain users are consistently 4-5x slower than others. I did an apples to apples comparison, only changing the user ids on a short n=5 prompt, and is was seven seconds vs 29 seconds for different users. The slow users are also our “power users” that use the software much more regularly than others. The docs say that throttling is at the account level, so I’m not sure why this would be.

It feels like they are slowing down selected users, my speed dropped about 2 days ago. Currently, a request with ‘messages’: [{‘role’: ‘user’, ‘content’: ‘Hello!’}] takes about 20 seconds ± 3 seconds. However, for example, there is a service that works instantly :

we are seeing slowing down in our responses as well. does anyone know the cause or fix?

If OpenAI has applied some throttling to your account there is probably little you can do about it (though I suspect it may just be some internal trade-off they are making between how much resources to spend on 3.5 vs 4).

But what you can do is try to reduce the number of output tokens – which has the most effect on response time. Here I wrote a list of tricks I’ve used to reduce GPT response time - I hope some of them will help.

Hey, its really painful. had to wait a long just to generate a simple 4-5 lines text.
I am from bangladesh