I realized everyone here is in the same boat, 10-60 seconds delay time, sometimes it won’t even respond causing timeout to our servers, while when using it online in your website we have literally 0 seconds delay (the typing effect of the bot is just a UX thing, and there is literally 0 delay).
Don’t want developers to have your abilities? understandable. But.
Sam Altman, if you only want us to use your API in order to learn about your product, be honest and say it, many people here raised money based on your API, and if you have no real intention to allow them to pay and get a proper speed, ALL THE TIME CONSTANTLY, then just say it so they move to another LLM. ( we are already moving, as well as multiple founders we know with $10M+)
Sometimes having 2 seconds and other time 80s, is not fair, since we counted on the 2s, and don’t even get this.
Not even mentioning this or write about this in the docs, or reply here, to the thousands of developers, is not OK.
It’s fair to be upset with the downtimes and service issues. Although the typing effect is not just visual. GPT itself outputs one token at a time.
If you are not using stream=true then you are waiting for the complete response which will add to your delay.
Are you also using a retry/backoff library? It’s fair to complain and demand better service but in the meantime there are solutions to make the experience better for the end user.
Where there’s a problem, there’s an opportunity. Yinyang baby
For example, a typing indicator for a bot. Let’s be real. Before GPT you, or someone else would have to spend way more time to formulate and write up a response. A couple seconds - even a minute is still blazingly fast compared to humans.
I expect that the latency will decrease over time, right now, most of it is a factor of high server load, so as we scale up in the next few months, we will be better at this! Thanks for being patient and hanging in there, the demand is truly unprecedented so it’s a very difficult problem to solve.
Well, we are just getting an sql respond, we send 200 tokens and receive 40, the stream does not help.
It is ok that its not working, they do not owe us anything, but it is not so moral because it seems that they CAN provide a super fast experience ( the same prompt takes 0.5 seconds on their website all the time).
What they probably do, as I know Sam, is to use us to develop their product, they have no real intention that any of you will build something they can’t, the API money is nothing.
Better of to just say - hey - the service cost 1$ per request, and I will take it.
But to play with everyone, invite them to the party, provide them food, then close the gate and say “food is not enough”, is not so nice.
It is a free market so i can’t complain, but this is not what Sam used to teach in his lectures…
I imagine huge trucks dumping GPUs in a huge hole in the floor in which about a thousand guys permanently either dig deeper or plug them in… and one guy sitting on a mac book configuring them remotely from somewhere in india…
Hello good morning, I am working on an application and I am trying to integrate gpt-3.5-turbo. The big issue I’ve noticed is that “gpt-3.5-turbo” drags to give an answer, and it doesn’t matter if you have the streaming mode option enabled. With a simple question, it takes between 15 to 40 seconds.
Of course, I’ve already ruled out the possibility that my network or my application is the problem, since I’m testing it from the operating system terminal. I even bought a dedicated OVH server in the United States to rule out a latency issue, and it still returns the same response times.
To further debug what was going on, I changed the language model to (Davinci and Curie) and the server responses are practically instantaneous. This means that the problem is not mine it’s yours (OpenAI), and it has been like this for over a week.
I am surprised that with the money that OpenAI raise from the investments they receive, they don’t buy better servers. I even bought ChatGPT Plus, i notice that gpt-3.5-turbo goes 10x faster. I suspect that OpenAI purposely slowed down the “gpt-3.5-turbo” API so that we use more expensive models in our applications, as they make much more money from users buying ChatGPT Plus compared to the “gpt-3.5-turbo” API.
It’s obvious that GPT3.5 is independent from other models like (davinci or curie), you can see it from the response times I posted in the previous photo.
What’s strange is that the response times of the GPT3.5 API are much slower compared to (https://chat.openai.com/). If you have the paid version of ChatGPT, you’ll notice that it’s 10 times faster than the private API.
The solution you offer me is to use other models like (davinci), which are more expensive, then why does OpenAI offer a private GPT3.5 API if the response times are horrible.
I just wanted to address this part of your comment. It’s kind of like complaining to a restauraunt that their line-up at the drivethru window is too long when you saw that their food truck down the road has none. I get that you’re simply showing that their servers are slow, but come on. It’s kind of fair considering how fast they have grown.
Yes, I have noticed this as well. It’s becoming obvious to me that OpenAI is completely focused on controlling the market with ChatGPT. ChatGPT plugins? Jarvis? Evals? Tools that use other people’s services so that ChatGPT can deliver better results. Not available for developers to use in their own environment, only accessible through ChatGPT.
Who knows what’s going on in the background. 20% of their traffic could be malicious and they are trying to tackle it before wasting more money scaling. We’re in the middle of a massive race, and unfortunately we are voiceless
Amazing that people still offer to move to Da vinci where it is already obsolete for a long time and will be stopped within 2 months.
Even the examples in their website are still showing davinci, which is strange considering the amount of money and personal they have.
Bottom line - they do not really care about developers, they actually want to make sure developers are not advancing too fast, and this is why the delay. There is not a single explanation that the online service takes 0.5s all the time, and the API takes 35.
Just say it loud - you are using many developers as mouses, and these poor people have no idea they are depending on something that will always limit them for internal reasons.
That’s fair. I imagine it’s similar to telling someone who has paid for a fancy meal to eat dirt instead - which can be insulting, so I apologize.
I share a very similar sentiment with you. Unfortunately there is nothing that can be done besides sit, and wait, and hope that things improve. What a terrible feeling.
But, again, this is just today. Perhaps in a couple weeks small developers will be able to develop something without the fear of being shadowed. All I ask for is the same transparency that’s given to the massive corporations. I am hopeful.
Especially after seeing this (initially I thought it was only for ChatGPT, but I see now that it also works with Davinci)