Chat GPT's API is significantly slower than the website with GPT Plus

I see this as a major issue with the API. Requests with the API are taking me 50 seconds on average for some requests. Heck, a 400 token request with a 22 token response took me 15 seconds. The same exact response from ChatGPT took me 10 seconds on the website with a GPT Plus subscription, while it took me nearly 50 seconds on the API. Looking into it, it seems like GPT Plus using text-davinci-002. But this has a 2000 token limit rather than a 4000 one, so how is this possible? Any tips for increasing the speed when using the API?


Welcome to the community.

The text-davinci-002 is older than text-davinci-003. The latter it is better and faster. You’ll be using this with the completions endpoint.

There is another endpoint chat/completions which uses the same model that ChatGPT is powered by gpt-3.5-turbo , OpenAI’s most advanced language model. See OpenAI API.
This is I expect what you want.

The API is a little different but not much.

But doesn’t text-davinci-002 have ~2000 token limit? How is it able to do much more than that in the chatgpt website?

The other models I mentioned have 4096 limits. Use them.

I think many often people confuse network delays and data center congestion, etc with API performance.

For example, I am 12 time zones away from the US and call the OpenAI completion API, and here are the results when I time the call:


Test 1:  Completions.get_reply Time: 1.247792 secs
Test 2:  Completions.get_reply Time: 5.038783 secs
Test 3:  Completions.get_reply Time: 1.289555 secs
Test 4:  Completions.get_reply Time: 2.205132 secs

Kindly keep in mind that I am testing OpenAI APIs from the opposite side of the world than the US.

Also, if I repeat for other models, the results are similar. It’s mostly network traffic issues, not model issues, from my experience.

Having said that, lately I have noticed that text-davinci-002 is about 0.5 seconds faster than text-davinci-003 (for the same prompt), but did not test extensively.


Appendix: Example Test

1 Like

I see the same behavior, a chat completion API call using gpt-3.5-turbo with 2k request tokens takes on average 150s, which is longer than the ChatGPT Plus web app. This is not network-bound.

1 Like

I have a question. I am a Plus Subscriber so it seems 3.5 Turbo optimized should be much faster. In it is faster and does lot of things quicker but via the API, it is so slow like takes a minute or so.
Does this mean that even being PLUS does not help with API faster response? I see no different between free API and plus subscriber API. Can someone help?

1 Like

Yes. The API billing system for developers is not related to the monthly ChatGPT consumer subscriptions.


1 Like

That is strange. So then how am I going to pay API I use for? Is that going to be charged separately? I only use API and I thought buying PLUS would benefit better API response.

Additionally, responses differently than API. Do you know what would be the cause, I use the same thing like gpt-3 turbo in both but chat has better responses than API for the same text.

Yes. They are unrelated as I said above.


ChatGPT API the state at the moment is not usable, tooo freaking slow, what are you doing guys? Speeed the calls up, get more servers, do something!!!


The truth is that PLUS can greatly improve the speed of the API.
When I used the free credit, the responses often took more than ten seconds or even dropped out, even for simple questions.
When I switch to the key of the account with plus, it takes only one second.
I’m using the 3.5 turbo.

1 Like

I am facing the same issue. Reading the docs and the rate limits I would suspect it to handle parallel requests quite well but unfortunately it does not. In general the response time is very slow > 120 s.


I am sending requests from Germany with prompt tokens of 1346 and get a completion tokens of 1610. I tested this yesterday and today with the same prompt and i always get around 60 - 90 seconds which is unacceptable.

When i am using my chatGPT plus subscription i get the response in under 20 seconds. Can we please make improvements on the API response time.


Can we please get an official OpenAI statement/clarification on this?

User patrickkzhao says that he has noticed a significant speed up of API responses when subscribed to PLUS (Chat GPT's API is significantly slower than the website with GPT Plus - #12 by patrickkzhao), however, user ruby_coder said that the billing systems are unrelated and PLUS would not improve API responses.(Chat GPT's API is significantly slower than the website with GPT Plus - #8 by ruby_coder)

Which is the truth?

I find myself in the same boat as other users here - I have a service/app nearing readiness for launch, but have noticed a huge slowdown in the API (using 3.5-turbo model), to the point where it’s totally unusable and I’m even unwilling to risk a live demo of my system at this point.

If PLUS improves API response time, I’m more than happy to pay for it - just give us the proper documentation/information that we need.

If PLUS doesn’t help the slow API calls, would the Azure hosted ChatGPT version help instead? It would be a pain to change a bunch of code to use Azure instead, but again, the OpenAI API endpoint is kind of unusable [in a production environment] as things stand.


Just chiming in to agree that 3.5-turbo response times over the API appear to be much slower than they used to be. Also happy to subscribe if that increases the speed, but would love clarification.


Was just running a significant amount of automated testing for various endpoints and models in terms of response time. Using a paid Plus account and a paid for API account, the response times are much slower on the API. Could be caching on both client and mostly server and Ajax streaming as well that skews the results towards what appears to be significantly longer API response times and a lightning fast browser response in comparison. I cant even hijack that as it is shrouded in Cloudflare security. All that being said with maximum optimization on the client and transport code, and using the fastest /completions endpoint, with the fast 3.5 turbo model and minimals prompts I cannot get the 500 token prompt and completion packages back faster than 8250ms or 8.2 seconds. Too long. Fastest internet and located in North America. Looking at demos with Microsoft, Amazon, Google, IBM all lined up and unacceptable response times. Can you say faked cached responses in the demo and pray they stay within cache scope lol. Not going happen. So I need a definitive answer to give my team and company and as well the VCs will be lining up for their 12 million back. What we need is an SLA yesterday. We need a guaranteed level of service and response to outage as Microsoft and all the hyperscalers provide. If OpenAI drops the ball we will take over 36million in total funding and 500 billion TAM somewhere else. Looking at other LLMs at this point and at least a 3 provider redundancy as this is not going to sink our ship. So my question is what is OpenAI doing about back end support and unacceptable response times? Telling all our customers that OpenAI is not ready for prime time is not a solution. Edit: also tried the beta chat endpoint as well and no difference in response times. 8 to 10 seconds. Also have run tests at all intervals of 24 hour clock and it can be much worse during higher usage times. Timeouts are also common. Currently running at 2 percent. Also we are able to run 20 threads with 500ms overlap and not time out. This doesnt help with 8s but you can make 20 calls in 8 seconds for larger background tasks like research or document summarization.

Edit: Was able to run some classification calls to the API, minimum size, 3s fastest response time from the server. You won’t be able to have a running chat at this speed especially if you want to maintain thread context, these are small calls to classify command text.


Ok, so I originally had said slow, and it wbecause I thought entire response was ready much faster (as I was told by the A.i. that the typing was just a cosmetic feature).

However, yes as others have noted, the “stream” was the issue, and looks like that’s exactly how it’s working in their chat.

1 Like

I am just using the web client and it is as slow as a tortoise. I don’t know if the processing speed is slow or not, but it prints text to the screen when it provides and answer at such a slow speed that it feels like I am watching the opening up-scrolling credits of a Stars Wars movie.

This seems crazy to me.

When I asked ChatGPT 4.0 (Plus account) why it takes so long for it to print responses to the screen, it blamed my system. I fed it my system specs and internet speeds and asked it the following

given my hardware and software config and my internet speeds, what percentile of all users do you estimate my setup is in terms of how fast I should be able to receive text from you and display it on the screen?

ChaGPT 4.0:
It is difficult to provide an exact percentile ranking for your setup because there are many factors that contribute to the overall performance of a system, including hardware, software, and network configurations. Additionally, the user base varies widely in terms of device specifications and internet speeds.

However, based on your hardware specifications, macOS version, and internet speeds, your setup is likely in the top 10-20% of users in terms of performance and internet speed. Your computer has a modern quad-core Intel Core i5 processor, a decent amount of memory, and a high-speed internet connection, which together should allow you to receive and display text from this chat very quickly.

Keep in mind that these are rough estimates, and actual performance will depend on various factors, including the workload of your computer, the efficiency of your web browser, and potential network latency.

So I am running fast, but it’s still displaying text responses as if it was 1990 and was booting up Mail Order Monsters on the Commodore64.

I asked it if it was throttling response times, and it assures me it is not…not sure I believe it.

Not cool that I signed up and am paying $20/month for this. Also not cool that I am not able to get into the beta of the wide-open-internet access model.

no official statement?
Can we please have a faster api?

in the api docu i can find that you should use stream to improve latency. but this is not chat api specific.
If I am right you have to use web socket to use stream but if i ask chat gpt it has no example for chat so i think WS don’t work with chat api.

Can someone confirm that?

1 Like