Chat GPT's API is significantly slower than the website with GPT Plus

I have a question. I am a Plus Subscriber so it seems 3.5 Turbo optimized should be much faster. In it is faster and does lot of things quicker but via the API, it is so slow like takes a minute or so.
Does this mean that even being PLUS does not help with API faster response? I see no different between free API and plus subscriber API. Can someone help?

1 Like

Yes. The API billing system for developers is not related to the monthly ChatGPT consumer subscriptions.


1 Like

That is strange. So then how am I going to pay API I use for? Is that going to be charged separately? I only use API and I thought buying PLUS would benefit better API response.

Additionally, responses differently than API. Do you know what would be the cause, I use the same thing like gpt-3 turbo in both but chat has better responses than API for the same text.

Yes. They are unrelated as I said above.


ChatGPT API the state at the moment is not usable, tooo freaking slow, what are you doing guys? Speeed the calls up, get more servers, do something!!!


The truth is that PLUS can greatly improve the speed of the API.
When I used the free credit, the responses often took more than ten seconds or even dropped out, even for simple questions.
When I switch to the key of the account with plus, it takes only one second.
I’m using the 3.5 turbo.

1 Like

I am facing the same issue. Reading the docs and the rate limits I would suspect it to handle parallel requests quite well but unfortunately it does not. In general the response time is very slow > 120 s.


I am sending requests from Germany with prompt tokens of 1346 and get a completion tokens of 1610. I tested this yesterday and today with the same prompt and i always get around 60 - 90 seconds which is unacceptable.

When i am using my chatGPT plus subscription i get the response in under 20 seconds. Can we please make improvements on the API response time.


Can we please get an official OpenAI statement/clarification on this?

User patrickkzhao says that he has noticed a significant speed up of API responses when subscribed to PLUS (Chat GPT's API is significantly slower than the website with GPT Plus - #12 by patrickkzhao), however, user ruby_coder said that the billing systems are unrelated and PLUS would not improve API responses.(Chat GPT's API is significantly slower than the website with GPT Plus - #8 by ruby_coder)

Which is the truth?

I find myself in the same boat as other users here - I have a service/app nearing readiness for launch, but have noticed a huge slowdown in the API (using 3.5-turbo model), to the point where it’s totally unusable and I’m even unwilling to risk a live demo of my system at this point.

If PLUS improves API response time, I’m more than happy to pay for it - just give us the proper documentation/information that we need.

If PLUS doesn’t help the slow API calls, would the Azure hosted ChatGPT version help instead? It would be a pain to change a bunch of code to use Azure instead, but again, the OpenAI API endpoint is kind of unusable [in a production environment] as things stand.


Just chiming in to agree that 3.5-turbo response times over the API appear to be much slower than they used to be. Also happy to subscribe if that increases the speed, but would love clarification.


Was just running a significant amount of automated testing for various endpoints and models in terms of response time. Using a paid Plus account and a paid for API account, the response times are much slower on the API. Could be caching on both client and mostly server and Ajax streaming as well that skews the results towards what appears to be significantly longer API response times and a lightning fast browser response in comparison. I cant even hijack that as it is shrouded in Cloudflare security. All that being said with maximum optimization on the client and transport code, and using the fastest /completions endpoint, with the fast 3.5 turbo model and minimals prompts I cannot get the 500 token prompt and completion packages back faster than 8250ms or 8.2 seconds. Too long. Fastest internet and located in North America. Looking at demos with Microsoft, Amazon, Google, IBM all lined up and unacceptable response times. Can you say faked cached responses in the demo and pray they stay within cache scope lol. Not going happen. So I need a definitive answer to give my team and company and as well the VCs will be lining up for their 12 million back. What we need is an SLA yesterday. We need a guaranteed level of service and response to outage as Microsoft and all the hyperscalers provide. If OpenAI drops the ball we will take over 36million in total funding and 500 billion TAM somewhere else. Looking at other LLMs at this point and at least a 3 provider redundancy as this is not going to sink our ship. So my question is what is OpenAI doing about back end support and unacceptable response times? Telling all our customers that OpenAI is not ready for prime time is not a solution. Edit: also tried the beta chat endpoint as well and no difference in response times. 8 to 10 seconds. Also have run tests at all intervals of 24 hour clock and it can be much worse during higher usage times. Timeouts are also common. Currently running at 2 percent. Also we are able to run 20 threads with 500ms overlap and not time out. This doesnt help with 8s but you can make 20 calls in 8 seconds for larger background tasks like research or document summarization.

Edit: Was able to run some classification calls to the API, minimum size, 3s fastest response time from the server. You won’t be able to have a running chat at this speed especially if you want to maintain thread context, these are small calls to classify command text.


Ok, so I originally had said slow, and it wbecause I thought entire response was ready much faster (as I was told by the A.i. that the typing was just a cosmetic feature).

However, yes as others have noted, the “stream” was the issue, and looks like that’s exactly how it’s working in their chat.

1 Like

I am just using the web client and it is as slow as a tortoise. I don’t know if the processing speed is slow or not, but it prints text to the screen when it provides and answer at such a slow speed that it feels like I am watching the opening up-scrolling credits of a Stars Wars movie.

This seems crazy to me.

When I asked ChatGPT 4.0 (Plus account) why it takes so long for it to print responses to the screen, it blamed my system. I fed it my system specs and internet speeds and asked it the following

given my hardware and software config and my internet speeds, what percentile of all users do you estimate my setup is in terms of how fast I should be able to receive text from you and display it on the screen?

ChaGPT 4.0:
It is difficult to provide an exact percentile ranking for your setup because there are many factors that contribute to the overall performance of a system, including hardware, software, and network configurations. Additionally, the user base varies widely in terms of device specifications and internet speeds.

However, based on your hardware specifications, macOS version, and internet speeds, your setup is likely in the top 10-20% of users in terms of performance and internet speed. Your computer has a modern quad-core Intel Core i5 processor, a decent amount of memory, and a high-speed internet connection, which together should allow you to receive and display text from this chat very quickly.

Keep in mind that these are rough estimates, and actual performance will depend on various factors, including the workload of your computer, the efficiency of your web browser, and potential network latency.

So I am running fast, but it’s still displaying text responses as if it was 1990 and was booting up Mail Order Monsters on the Commodore64.

I asked it if it was throttling response times, and it assures me it is not…not sure I believe it.

Not cool that I signed up and am paying $20/month for this. Also not cool that I am not able to get into the beta of the wide-open-internet access model.

no official statement?
Can we please have a faster api?

in the api docu i can find that you should use stream to improve latency. but this is not chat api specific.
If I am right you have to use web socket to use stream but if i ask chat gpt it has no example for chat so i think WS don’t work with chat api.

Can someone confirm that?

1 Like

Microsoft just dropped the nerf bat. One request every min now. Even though my cognitive bill is $400 so far this month, they restrict me now. They are clamping down on AI agents instead of just expanding infrastructure. Sad. They are limiting this technology and development.

1 Like

is it so hard to people here to realize that they are intentionally limit this to make sure none of you is going to build something big enough before them? this industry is not for you indies and devs.
This is like the hardware chip industry, only the big companies have a place in this.

This is probably a mistake because in weeks, many small startups will start offering this exact capability, don’t forget they invent nothing new, this tech was actually developed outside the US, and most researches are outside the US, it’s just that THEY trained it with the whole internet data, having friends and access to ceo’s of Reddit, Github etc.

ok, i figured out. everything works like expected and is fast. the answer is: stream
and i was wrong. stream is working also with chat api.

1 Like

What I don’t get is why they don’t just raise prices for the API to something more demand/supply based. Cloud does this and it works just fine. As new resources come in, they can lower prices.

Leave the web interface alone, of course, so people can play around, but if people have high value cases which don’t use a lot of tokens, they should be able to use the API - imho.

1 Like

Mind explaining further what you mean by ‘stream’? How did you manage to make the API respond quickly? Thanks.

Surprisingly, it seems that with the API, GPT-3.5 Turbo, GPT-4, and Text-Davinci-003 are slower than Text-Davinci-002 recently. My company account’s logic was so slow with GPT-4 that I tested it directly using my Postman call. Apparently, all of them are slower than Text-Davinci-002, even with the same prompt and Text-Davinci-002’s longer text completion.

However, Text-Davinci-002’s accuracy is not good enough for my use case. Therefore, even though it is faster than the other models, I cannot use it. :disappointed_relieved:

Can anyone else try Text-Davinci-002 instead of the other models and see the speed?