API "gpt-3.5-turbo" Sucks (Slow)

Hello good morning, I am working on an application and I am trying to integrate gpt-3.5-turbo. The big issue I’ve noticed is that “gpt-3.5-turbo” drags to give an answer, and it doesn’t matter if you have the streaming mode option enabled. With a simple question, it takes between 15 to 40 seconds.

Of course, I’ve already ruled out the possibility that my network or my application is the problem, since I’m testing it from the operating system terminal. I even bought a dedicated OVH server in the United States to rule out a latency issue, and it still returns the same response times.

To further debug what was going on, I changed the language model to (Davinci and Curie) and the server responses are practically instantaneous. This means that the problem is not mine it’s yours (OpenAI), and it has been like this for over a week.

I am surprised that with the money that OpenAI raise from the investments they receive, they don’t buy better servers. I even bought ChatGPT Plus, i notice that gpt-3.5-turbo goes 10x faster. I suspect that OpenAI purposely slowed down the “gpt-3.5-turbo” API so that we use more expensive models in our applications, as they make much more money from users buying ChatGPT Plus compared to the “gpt-3.5-turbo” API.

6 Likes

I am experiencing the same!
Few days ago, I was getting responses from GPT-3.5-turbo very quickly as compared to what I am getting since yesterday. The response time have dramatically increased from 5-8s to more than 20s for the same prompt. I also tried using text-davinci-03 model and it’s significantly faster and more accurate than 3.5-turbo but it’s significantly costlier too.
I’m disappointed that 3.5-turbo has slowed down so much that it’s practically unusable user experience wise.

4 Likes

Please keep in mind that ChatGPT went from dozens to 100+ million users in two months… the biggest growth in all of human history for any app.

That they’re keeping the servers up at all is tremendous in my eyes.

Be patient. I’m very sure the reliability will improve as time goes on.

1 Like

The thing is, as a customer, I’m not interested in waiting weeks or months for this issue to be resolved, since I only pay for it to work.

What bothers me the most is that the same model (GPT3.5) works 10 times faster on (https://chat.openai.com/) and not on the private API. What’s going on, OpenAI?

7 Likes

Same issue here. My telegram bot currently receive so many time out errors with “gpt-3.5-turbo”. I’m looking for the error in my side and turns out , it’s the api issue. When I changed the model, the issue is solved. Hopefully, openai resolves this api outage problem quickly.

3 Likes

Same issue, the gpt-3.5-turbo API is too slow

3 Likes

That’s a really quick and user friendly solution! Thanks for sharing :heart:

2 Likes

Caching good answers for common questions might help as well (e.g. if you want to collect commands for machines).

Or for OCR pipelines you might let GPT-3 build some templates (like back in the good old days) e.g. for invoices. If another invoice of the same client comes up and it is possible to gather all data with that you might not even need the api for every request.

2 Likes

Well this is still a major issue, a simple 180 token response takes 30 seconds! We seen response times up to 60 sec.

We planned to released our product now but this makes it impossible to use, the users think its broken.

We also got rate limited, we are nowehere near our rate limits of 3500 rpm and 90000 tpm.

3 Likes

Same here, it is fxcking slow!!!
Beyond that, it often breaks. Respond that the server is overloaded, and result my program exits.

However, when I use chatGPT, which model under the hood is also gpt-3.5-turbo runnig really fast!! It’s ridiculous!!!

2 Likes

I am experiencing the same issue. Several days ago it took on average 40s to respond, and yesterday it went up to at least 80s. Today somehow, after couple of hours of errors, it went down to about 60s. Still unbelievably slow. The same prompts take ChatGPT webpage version only up to 10s.

1 Like

I think there is a general problem. Many people, including us, are experiencing the same issue. The OpenAI team is not offering any solutions. I believe they don’t value developers.

1 Like

I believe they literally can’t rent enough hardware to meet demand. For example, none of the North American AWS regions have current-generation GPU instances available for provisioning. I would assume Azure is in the same boat, because demand goes where resources are available.

4 Likes

I think The response time is proportional to the contents of your Prompt or your tasks inside the prompt + AI in itself will probably hallucinate on your requests. I have observed that if response doesn’t come back with a minute or 60 seconds time frame, it had probably started to hallucinate, i.e. make something up, unrelated to your requests.

A good adequate response window is somewhere between 30 to 40 seconds. I mean I don’t think anyone should expect instantaneous response from the APIs. Eg. The task I am doing is difficult to do by algorithms because it is not scalable so I have the AI do for me instead.

1 Like

Same here gpt-4 is even slower. So it’s unfortunately unusable for us now. Bard’s API is also in beta and only available via waiting list. Open source to the rescue!

3 Likes

I have the same problem. Over the past weeks the response time went up and my application has now poor performance. It is a conversation based app.

1 Like

Any experiences with the waitlist on azure hosted openai models?

1 Like

Noticing the same for GPT-003 (davinci), although, with less impact. Still, roughly 20% slower since ~25-April. Read this chart from right to left ;latest requests are on the left. We used to hover around 4.4 seconds response time. Now we’re at 5.9s average.

2 Likes

Exactly … it’s much faster on chatgpt

1 Like

Hey Bill, what’s that tool, seems interesting?

2 Likes