Your service is useless for developers - just tell us the truth

it’s fine that it’s slow,(GPT “turbo”) but at least go out and declare to everyone that you are intentionally delaying the API.
After reading here a lot (Chat GPT's API is significantly slower than the website with GPT Plus)

I realized everyone here is in the same boat, 10-60 seconds delay time, sometimes it won’t even respond causing timeout to our servers, while when using it online in your website we have literally 0 seconds delay (the typing effect of the bot is just a UX thing, and there is literally 0 delay).
Don’t want developers to have your abilities? understandable. But.

Sam Altman, if you only want us to use your API in order to learn about your product, be honest and say it, many people here raised money based on your API, and if you have no real intention to allow them to pay and get a proper speed, ALL THE TIME CONSTANTLY, then just say it so they move to another LLM. ( we are already moving, as well as multiple founders we know with $10M+)

Sometimes having 2 seconds and other time 80s, is not fair, since we counted on the 2s, and don’t even get this.
Not even mentioning this or write about this in the docs, or reply here, to the thousands of developers, is not OK.


I am the same. Recently, API is too laggy and too laggy. The delay is significant, usually lasting several seconds or more. I don’t know what’s going on? Solve Ha

1 Like

It’s fair to be upset with the downtimes and service issues. Although the typing effect is not just visual. GPT itself outputs one token at a time.

If you are not using stream=true then you are waiting for the complete response which will add to your delay.

Are you also using a retry/backoff library? It’s fair to complain and demand better service but in the meantime there are solutions to make the experience better for the end user.

Where there’s a problem, there’s an opportunity. Yinyang baby

For example, a typing indicator for a bot. Let’s be real. Before GPT you, or someone else would have to spend way more time to formulate and write up a response. A couple seconds - even a minute is still blazingly fast compared to humans.

1 Like

I expect that the latency will decrease over time, right now, most of it is a factor of high server load, so as we scale up in the next few months, we will be better at this! Thanks for being patient and hanging in there, the demand is truly unprecedented so it’s a very difficult problem to solve.


You should contact sales for dedicated machines if so much money is involved.

Well, we are just getting an sql respond, we send 200 tokens and receive 40, the stream does not help.
It is ok that its not working, they do not owe us anything, but it is not so moral because it seems that they CAN provide a super fast experience ( the same prompt takes 0.5 seconds on their website all the time).

What they probably do, as I know Sam, is to use us to develop their product, they have no real intention that any of you will build something they can’t, the API money is nothing.

Better of to just say - hey - the service cost 1$ per request, and I will take it.
But to play with everyone, invite them to the party, provide them food, then close the gate and say “food is not enough”, is not so nice.

It is a free market so i can’t complain, but this is not what Sam used to teach in his lectures…

Ah, The Amazon Basics method.

Well, day by day there are more language models entering the race, and more big players are placing their chips. Going to be a fun ride.

I’m sure that soon enough these things won’t be issues. We are thinking about today and tomorrow, I’d imagine that OpenAI is thinking about next year, and in 5 years.


Yea! How hard can it be to scale some of the worlds most bleeding edge tech with the fastest growing user-base in history to infinity and beyond?! Does OpenAI even webscale??

1 Like

I imagine huge trucks dumping GPUs in a huge hole in the floor in which about a thousand guys permanently either dig deeper or plug them in… and one guy sitting on a mac book configuring them remotely from somewhere in india…

1 Like

I have real problem with api response, because I am using aws api gateway with lambda, and it has a 30s timeout. If the response is more than 30s the api will no return nothing to the front end

1 Like

Even with a streaming method, it still takes up to 15~17s for the first bytes to return to the client side…


Believe me, I know.

I was introducing the technology to a couple sales representatives that I work with. Took out my phone, said “watch this”, and waited for the most awkward 10 seconds of my life for it to respond.

I’ve resorted to a cached conversation now for demonstrations because I simply cannot trust the response times.

1 Like

Hello good morning, I am working on an application and I am trying to integrate gpt-3.5-turbo. The big issue I’ve noticed is that “gpt-3.5-turbo” drags to give an answer, and it doesn’t matter if you have the streaming mode option enabled. With a simple question, it takes between 15 to 40 seconds.

Of course, I’ve already ruled out the possibility that my network or my application is the problem, since I’m testing it from the operating system terminal. I even bought a dedicated OVH server in the United States to rule out a latency issue, and it still returns the same response times.

To further debug what was going on, I changed the language model to (Davinci and Curie) and the server responses are practically instantaneous. This means that the problem is not mine it’s yours (OpenAI), and it has been like this for over a week.

I am surprised that with the money that OpenAI raise from the investments they receive, they don’t buy better servers. I even bought ChatGPT Plus, i notice that gpt-3.5-turbo goes 10x faster. I suspect that OpenAI purposely slowed down the “gpt-3.5-turbo” API so that we use more expensive models in our applications, as they make much more money from users buying ChatGPT Plus compared to the “gpt-3.5-turbo” API.

Please keep in mind that ChatGPT went from dozens to 100+ million users in two months… the biggest growth in all of human history for any app.

That they’re keeping the servers up at all is tremendous in my eyes.

Be patient. I’m sure the reliability will improve as time goes on.

1 Like

AFAIK There’s no connection between the other models besides whatever routing server they use for the models. GPT3-5’s speed is independent to itself.

You could try swapping to GPT-4 if the response is taking too long. Or Davinci which would be more reliable

It’s obvious that GPT3.5 is independent from other models like (davinci or curie), you can see it from the response times I posted in the previous photo.

What’s strange is that the response times of the GPT3.5 API are much slower compared to ( If you have the paid version of ChatGPT, you’ll notice that it’s 10 times faster than the private API.

The solution you offer me is to use other models like (davinci), which are more expensive, then why does OpenAI offer a private GPT3.5 API if the response times are horrible.

I just wanted to address this part of your comment. It’s kind of like complaining to a restauraunt that their line-up at the drivethru window is too long when you saw that their food truck down the road has none. I get that you’re simply showing that their servers are slow, but come on. It’s kind of fair considering how fast they have grown.

Yes, I have noticed this as well. It’s becoming obvious to me that OpenAI is completely focused on controlling the market with ChatGPT. ChatGPT plugins? Jarvis? Evals? Tools that use other people’s services so that ChatGPT can deliver better results. Not available for developers to use in their own environment, only accessible through ChatGPT.

Who knows what’s going on in the background. 20% of their traffic could be malicious and they are trying to tackle it before wasting more money scaling. We’re in the middle of a massive race, and unfortunately we are voiceless

1 Like

As a customer, I don’t want to wait weeks or even months for this problem to be fixed, especially since I’m paying for it to function properly.

It’s quite frustrating that the same model (GPT3.5) operates 10 times more quickly on ( as opposed to the private API. What is happening, OpenAI?

1 Like

Amazing that people still offer to move to Da vinci where it is already obsolete for a long time and will be stopped within 2 months.
Even the examples in their website are still showing davinci, which is strange considering the amount of money and personal they have.

Bottom line - they do not really care about developers, they actually want to make sure developers are not advancing too fast, and this is why the delay. There is not a single explanation that the online service takes 0.5s all the time, and the API takes 35.
Just say it loud - you are using many developers as mouses, and these poor people have no idea they are depending on something that will always limit them for internal reasons.


That’s fair. I imagine it’s similar to telling someone who has paid for a fancy meal to eat dirt instead - which can be insulting, so I apologize.

I share a very similar sentiment with you. Unfortunately there is nothing that can be done besides sit, and wait, and hope that things improve. What a terrible feeling.

But, again, this is just today. Perhaps in a couple weeks small developers will be able to develop something without the fear of being shadowed. All I ask for is the same transparency that’s given to the massive corporations. I am hopeful.

Especially after seeing this (initially I thought it was only for ChatGPT, but I see now that it also works with Davinci)