Hello good morning, I am working on an application and I am trying to integrate gpt-3.5-turbo. The big issue I’ve noticed is that “gpt-3.5-turbo” drags to give an answer, and it doesn’t matter if you have the streaming mode option enabled. With a simple question, it takes between 15 to 40 seconds.
Of course, I’ve already ruled out the possibility that my network or my application is the problem, since I’m testing it from the operating system terminal. I even bought a dedicated OVH server in the United States to rule out a latency issue, and it still returns the same response times.
To further debug what was going on, I changed the language model to (Davinci and Curie) and the server responses are practically instantaneous. This means that the problem is not mine it’s yours (OpenAI), and it has been like this for over a week.
I am surprised that with the money that OpenAI raise from the investments they receive, they don’t buy better servers. I even bought ChatGPT Plus, i notice that gpt-3.5-turbo goes 10x faster. I suspect that OpenAI purposely slowed down the “gpt-3.5-turbo” API so that we use more expensive models in our applications, as they make much more money from users buying ChatGPT Plus compared to the “gpt-3.5-turbo” API.
I am experiencing the same!
Few days ago, I was getting responses from GPT-3.5-turbo very quickly as compared to what I am getting since yesterday. The response time have dramatically increased from 5-8s to more than 20s for the same prompt. I also tried using text-davinci-03 model and it’s significantly faster and more accurate than 3.5-turbo but it’s significantly costlier too.
I’m disappointed that 3.5-turbo has slowed down so much that it’s practically unusable user experience wise.
Same issue here. My telegram bot currently receive so many time out errors with “gpt-3.5-turbo”. I’m looking for the error in my side and turns out , it’s the api issue. When I changed the model, the issue is solved. Hopefully, openai resolves this api outage problem quickly.
Caching good answers for common questions might help as well (e.g. if you want to collect commands for machines).
Or for OCR pipelines you might let GPT-3 build some templates (like back in the good old days) e.g. for invoices. If another invoice of the same client comes up and it is possible to gather all data with that you might not even need the api for every request.
I am experiencing the same issue. Several days ago it took on average 40s to respond, and yesterday it went up to at least 80s. Today somehow, after couple of hours of errors, it went down to about 60s. Still unbelievably slow. The same prompts take ChatGPT webpage version only up to 10s.
I believe they literally can’t rent enough hardware to meet demand. For example, none of the North American AWS regions have current-generation GPU instances available for provisioning. I would assume Azure is in the same boat, because demand goes where resources are available.
I think The response time is proportional to the contents of your Prompt or your tasks inside the prompt + AI in itself will probably hallucinate on your requests. I have observed that if response doesn’t come back with a minute or 60 seconds time frame, it had probably started to hallucinate, i.e. make something up, unrelated to your requests.
A good adequate response window is somewhere between 30 to 40 seconds. I mean I don’t think anyone should expect instantaneous response from the APIs. Eg. The task I am doing is difficult to do by algorithms because it is not scalable so I have the AI do for me instead.
Noticing the same for GPT-003 (davinci), although, with less impact. Still, roughly 20% slower since ~25-April. Read this chart from right to left ;latest requests are on the left. We used to hover around 4.4 seconds response time. Now we’re at 5.9s average.