Hello good morning, I am working on an application and I am trying to integrate gpt-3.5-turbo. The big issue I’ve noticed is that “gpt-3.5-turbo” drags to give an answer, and it doesn’t matter if you have the streaming mode option enabled. With a simple question, it takes between 15 to 40 seconds.
Of course, I’ve already ruled out the possibility that my network or my application is the problem, since I’m testing it from the operating system terminal. I even bought a dedicated OVH server in the United States to rule out a latency issue, and it still returns the same response times.
To further debug what was going on, I changed the language model to (Davinci and Curie) and the server responses are practically instantaneous. This means that the problem is not mine it’s yours (OpenAI), and it has been like this for over a week.
I am surprised that with the money that OpenAI raise from the investments they receive, they don’t buy better servers. I even bought ChatGPT Plus, i notice that gpt-3.5-turbo goes 10x faster. I suspect that OpenAI purposely slowed down the “gpt-3.5-turbo” API so that we use more expensive models in our applications, as they make much more money from users buying ChatGPT Plus compared to the “gpt-3.5-turbo” API.
I am experiencing the same!
Few days ago, I was getting responses from GPT-3.5-turbo very quickly as compared to what I am getting since yesterday. The response time have dramatically increased from 5-8s to more than 20s for the same prompt. I also tried using text-davinci-03 model and it’s significantly faster and more accurate than 3.5-turbo but it’s significantly costlier too.
I’m disappointed that 3.5-turbo has slowed down so much that it’s practically unusable user experience wise.
Same issue here. My telegram bot currently receive so many time out errors with “gpt-3.5-turbo”. I’m looking for the error in my side and turns out , it’s the api issue. When I changed the model, the issue is solved. Hopefully, openai resolves this api outage problem quickly.
Caching good answers for common questions might help as well (e.g. if you want to collect commands for machines).
Or for OCR pipelines you might let GPT-3 build some templates (like back in the good old days) e.g. for invoices. If another invoice of the same client comes up and it is possible to gather all data with that you might not even need the api for every request.
I am experiencing the same issue. Several days ago it took on average 40s to respond, and yesterday it went up to at least 80s. Today somehow, after couple of hours of errors, it went down to about 60s. Still unbelievably slow. The same prompts take ChatGPT webpage version only up to 10s.
I believe they literally can’t rent enough hardware to meet demand. For example, none of the North American AWS regions have current-generation GPU instances available for provisioning. I would assume Azure is in the same boat, because demand goes where resources are available.