GPT 4 API is Very Slow Still

dcooper7 · May 11, 2023, 5:56pm

Hi,
Please can I get some advice.
I am using the GPT 4 API for a new project, but after prompting I have to wait for over a minute for a response. The 3.5 Turbo was a lot quicker but obviously less accurate. Are there any fixes for this or is anything been done to rectify it? I noticed a couple of other forum posts about this issue.
Thank you.

curt.kennedy · May 11, 2023, 6:15pm

I am averaging about 5 tokens of generated output per second with GPT-4. It varies from 2-7, but the eyeball average is about 5 TPS (output).

jwatte · May 11, 2023, 8:02pm

GPT-4 is a lot slower than GPT-3.5, but also, we’ve seen significantly slower generation even with 3.5 in the last few days.
(We’re paid API users.)
Generating 1700 tokens from a prompt of 1300 took over a minute this morning (on gpt-3.5-turbo)

"model": "gpt-3.5-turbo-0301", "usage": {"prompt_tokens": 1329, "completion_tokens": 1692, "total_tokens": 3021}

So, maybe they’re trying to increase throughput by increasing latency, or this is just a new outcome of buckling under too much success …

(We also get “model is overloaded” errors with some frequency.)

curt.kennedy · May 12, 2023, 5:57pm

There is a certain lag time in implementing a new model. I just started using GPT-4 regularly through the API a few days ago. Even though I’ve had it for a while.

So there could be a large wave of implementers starting to roll out products based on either GPT-4 or GPT-3.5, many weeks after the initial release. Which could partially explain the slowdowns.

bichengC · May 13, 2023, 1:38pm

I have not had the chance to try out GPT-4 API response time after the new ChatGPT release since yesterday. I do not believe it has any impact on the GPT-4 API model. During our testing before, it took approximately 60-80 seconds to complete a full 8k prompt and response using the GPT-4 API.

liger · May 16, 2023, 4:48pm

Same here, paying API user and having very slow response times on both gpt-3 and gpt-4. Tested using Postman with stream on/off… OpenAI Playground has the same behaviour in terms of response time. But haven’t experienced any “model is overloaded” errors.

pianpian · August 9, 2023, 12:51am

Does anyone still experience error with GPT 4?

curt.kennedy · August 9, 2023, 1:07am

Since my last post above, I’m seeing a 2x speedup in GPT-4, from 5 tokens/sec to now 10 tokens/sec. Also, GPT-4-32k is still the fastest of the GPT-4 models at 20 tokens/sec.

nicolasemerson1980 · August 9, 2023, 1:21pm

That’s really interesting.

We just got access to GPT-4 8k today.

Great responses to our calls, but the speed is a bit of a bummer.

Wondering if the increase in speeds for models as usually just an incremental thing.

You say it’s doubled speed since May, so do you expect same again increase by November?

curt.kennedy · August 9, 2023, 4:56pm

The speed of GPT-4 is one of the major complaints against it (you see this all over this forum). But the other one is quality. My main concern/fear is that OpenAI decides to increase the speed at the expense of reducing the model quality.

It’s a balance, and hopefully each and every speed increase does not come at the expense of quality. I’d rather have slow + quality than fast + poor quality.

The challenge here is to be patient, and hope hardware improvements arrive (like ASIC/FPGA level) and/or massive GPU’s arrive. And somehow through all the AI mania, get produced and make it into reality. Another angle is algorithm improvements, but those are tricky and take time to iron out.

It took me 2+ years of waiting to finally get a PS5. That’s what we are up against.

Having said that, I have no crystal ball saying what speedups might happen in November, especially for a product under massive demand and usage. I hope for you it’s shorter than waiting for a PS5.

alden · August 11, 2023, 1:35pm

Agree.

slow + quality : GPT-4

fast + poor quality : GPT-3.5

(That seems to work great right now)

ady · August 12, 2023, 4:06pm

true very true
Yes, the lag time in pathetic.

dschnell · September 14, 2023, 3:39pm

what about Azure GPT-4 API ? Is there a difference in speed in comparison to OpenAI API ?

alden · September 26, 2023, 11:18pm

Highly doubt it. I don’t see the reason why it would be much different.

ashkanani · September 27, 2023, 12:15am

It is so slow and out of focus. It keeps giving me wrong answers all the time!

Topic		Replies	Views
GPT-4 extremely slow compared to 3.5 API	15	8429	December 17, 2023
Assistant API Performance is Very Slow API plugin-development , api	10	4916	March 7, 2024
Open AI GPT 4 API is absurdly slow API	13	9942	December 17, 2023
Chat API is slow!, Fix it! API gpt-35-turbo , chatgpt , api	6	2518	December 24, 2023
ChatGPT 3.5 Turbo Vs ChatGPT 4 - API Response Speed API	9	3543	December 24, 2023

GPT 4 API is Very Slow Still

Related topics