Gpt-4-0125-preview INCREDIBLY slower than 3.5 turbo

younus23 · February 19, 2024, 2:07pm

Using the new model, i;m finding my response times have made my application impossible to use. The average token generation is around 4000, it wasn’t the fastest before either (between 1 minute to 1.5 minute response times) but now its taking almost 5-6 minutes…even if i go back down to gpt-4-1106-preview

Downgrading to 3.5 turbo is significantly faster but response quality is way worse.

Is anyone else experiencing this issue? Any ideas on speeding up the responses?

younus23 · February 19, 2024, 2:17pm

Just look at these! First one is gpt-4-0125, and third one is gpt-4-1106. the other 2 are 3.5 turbo

Screenshot 2024-02-19 at 8.15.48 AM

logankilpatrick · February 19, 2024, 2:24pm

Hey, this is somewhat to be expected. The GPT-4 series models will always be slower than the 3.5T series models. Which model were you using before, if I am understanding you right, your saying the token generation time went from 1 min to 5 min?

younus23 · February 19, 2024, 2:30pm

yup, my average generation time on gpt-4-1106 before was 50 seconds - 1.5 minutes. I understand 4 might be slower, but the difference before was much closer. Now it’s unbearable. That 4.7 mins in the screenshot is the fastest I’ve seen all day.

jr.2509 · February 19, 2024, 2:38pm

How many tokens did you use as input and/or what’s your area of application? I’ve been using GPT-4 models a few times today and completion time was normal.

younus23 · February 19, 2024, 2:51pm

{ prompt_tokens: 2635, completion_tokens: 614, total_tokens: 3249 } ( this on took 4.9 minutes on gpt-4-1106-preview)

Somewhere between this and 4k total, usually, the completion tokens are up a bit higher. It takes in some documents and rewrites certain aspects of it

jr.2509 · February 19, 2024, 3:05pm

This seems odd. For similar token levels, my completion times are usually within 30-60 seconds per API call for GPT-4 turbo models. Just to rule this out, are there any steps prior to ingestion by the model that could be causing this?

younus23 · February 19, 2024, 3:09pm

Nope, the front end hits the route directly. its just one prompt on that route, nothing else happening besides the open ai api call. And internet is not an issue either, getting about 700mbps down and 100 upload.

_j · February 19, 2024, 3:15pm

I can confirm slowness. (ChatGPT’s vision, vision being based on 1106, was also very slow at production when tested earlier.)

A march toward consistent low token production rate showing, although the 1106 model also mentioned above is not evaluated:

My own single requests:
—gpt-4-1106-preview—
[32 tokens in 3.3s. 9.6 tps]
[600 tokens in 32.0s. 18.8 tps]
—gpt-4-turbo-preview—
[32 tokens in 3.0s. 10.6 tps]
[600 tokens in 45.2s. 13.3 tps]
—gpt-3.5-turbo—
[32 tokens in 0.9s. 33.7 tps]
[600 tokens in 11.7s. 51.4 tps]

This is with a small input context with a writing request.

Also note that the token production rate has, in the past (for those who recognized the big switch on their account when slowing was implemented), been slower for those at tier 1 of API payment history.

younus23 · February 19, 2024, 3:25pm

That might be it, I’m on tier 1. But I still don’t understand the sudden decrease in speed on my current tier, putting a big wrench in my application.

_j · February 19, 2024, 3:44pm

By geography, it is possible that you may be routed to different datacenters, considering those on Azure with commercial Microsoft OpenAI services can pick from many deployment locations. You thus could get different performance than others. There’s no clarity about where OpenAI API requests are serviced from.

It would be nice to think that one is no longer discriminated against because of how much they prepaid.

The rate limits and tier documentation has had this prior text eradicated: As your usage tier increases, we may also move your account onto lower latency models behind the scenes.

The alternate case is the availability of services remains directly dispensed in priority by payment trust tier. OpenAI may not be willing to go on the record about their service management policies.

younus23 · February 20, 2024, 11:04pm

Update: Might just have been the timing of using the API (Which still worries me if it suddenly goes up when I go to production) but now my response times are ~2 mins. Still not the best, but better. Hoping to jump up in tiers and get this down even more.

pc1 · July 22, 2024, 10:34pm

Sorry, wrong thread. Please remove this comment.

Topic		Replies	Views
Gpt-4-0125-preview is slower than gpt-4-0613? Feedback gpt-4 , api	5	5633	January 30, 2024
GPT 4 API is Very Slow Still API gpt-4 , chatgpt , api	15	7052	December 16, 2023
GPT-3.5 API is very slow. Any fix? API	31	10082	October 12, 2023
GPT-3.5 Turbo API response is slow API	20	12629	November 11, 2023
GPT-3.5 API is 30x slower than ChatGPT equivalent prompt API gpt-35-turbo , api	69	14421	November 30, 2023

Gpt-4-0125-preview INCREDIBLY slower than 3.5 turbo

Related topics