Chat GPT's API is significantly slower than the website with GPT Plus

I see this as a major issue with the API. Requests with the API are taking me 50 seconds on average for some requests. Heck, a 400 token request with a 22 token response took me 15 seconds. The same exact response from ChatGPT took me 10 seconds on the website with a GPT Plus subscription, while it took me nearly 50 seconds on the API. Looking into it, it seems like GPT Plus using text-davinci-002. But this has a 2000 token limit rather than a 4000 one, so how is this possible? Any tips for increasing the speed when using the API?

1 Like

Welcome to the community.

The text-davinci-002 is older than text-davinci-003. The latter it is better and faster. You’ll be using this with the completions endpoint.

There is another endpoint chat/completions which uses the same model that ChatGPT is powered by gpt-3.5-turbo , OpenAI’s most advanced language model. See OpenAI API.
This is I expect what you want.

The API is a little different but not much.

But doesn’t text-davinci-002 have ~2000 token limit? How is it able to do much more than that in the chatgpt website?

The other models I mentioned have 4096 limits. Use them.

I think many often people confuse network delays and data center congestion, etc with API performance.

For example, I am 12 time zones away from the US and call the OpenAI completion API, and here are the results when I time the call:


Test 1:  Completions.get_reply Time: 1.247792 secs
Test 2:  Completions.get_reply Time: 5.038783 secs
Test 3:  Completions.get_reply Time: 1.289555 secs
Test 4:  Completions.get_reply Time: 2.205132 secs

Kindly keep in mind that I am testing OpenAI APIs from the opposite side of the world than the US.

Also, if I repeat for other models, the results are similar. It’s mostly network traffic issues, not model issues, from my experience.

Having said that, lately I have noticed that text-davinci-002 is about 0.5 seconds faster than text-davinci-003 (for the same prompt), but did not test extensively.


Appendix: Example Test

1 Like

I see the same behavior, a chat completion API call using gpt-3.5-turbo with 2k request tokens takes on average 150s, which is longer than the ChatGPT Plus web app. This is not network-bound.

I have a question. I am a Plus Subscriber so it seems 3.5 Turbo optimized should be much faster. In it is faster and does lot of things quicker but via the API, it is so slow like takes a minute or so.
Does this mean that even being PLUS does not help with API faster response? I see no different between free API and plus subscriber API. Can someone help?

1 Like

Yes. The API billing system for developers is not related to the monthly ChatGPT consumer subscriptions.


1 Like

That is strange. So then how am I going to pay API I use for? Is that going to be charged separately? I only use API and I thought buying PLUS would benefit better API response.

Additionally, responses differently than API. Do you know what would be the cause, I use the same thing like gpt-3 turbo in both but chat has better responses than API for the same text.

Yes. They are unrelated as I said above.


ChatGPT API the state at the moment is not usable, tooo freaking slow, what are you doing guys? Speeed the calls up, get more servers, do something!!!


The truth is that PLUS can greatly improve the speed of the API.
When I used the free credit, the responses often took more than ten seconds or even dropped out, even for simple questions.
When I switch to the key of the account with plus, it takes only one second.
I’m using the 3.5 turbo.

I am facing the same issue. Reading the docs and the rate limits I would suspect it to handle parallel requests quite well but unfortunately it does not. In general the response time is very slow > 120 s.


I am sending requests from Germany with prompt tokens of 1346 and get a completion tokens of 1610. I tested this yesterday and today with the same prompt and i always get around 60 - 90 seconds which is unacceptable.

When i am using my chatGPT plus subscription i get the response in under 20 seconds. Can we please make improvements on the API response time.


Can we please get an official OpenAI statement/clarification on this?

User patrickkzhao says that he has noticed a significant speed up of API responses when subscribed to PLUS (Chat GPT's API is significantly slower than the website with GPT Plus - #12 by patrickkzhao), however, user ruby_coder said that the billing systems are unrelated and PLUS would not improve API responses.(Chat GPT's API is significantly slower than the website with GPT Plus - #8 by ruby_coder)

Which is the truth?

I find myself in the same boat as other users here - I have a service/app nearing readiness for launch, but have noticed a huge slowdown in the API (using 3.5-turbo model), to the point where it’s totally unusable and I’m even unwilling to risk a live demo of my system at this point.

If PLUS improves API response time, I’m more than happy to pay for it - just give us the proper documentation/information that we need.

If PLUS doesn’t help the slow API calls, would the Azure hosted ChatGPT version help instead? It would be a pain to change a bunch of code to use Azure instead, but again, the OpenAI API endpoint is kind of unusable [in a production environment] as things stand.


Just chiming in to agree that 3.5-turbo response times over the API appear to be much slower than they used to be. Also happy to subscribe if that increases the speed, but would love clarification.