GP4-Turbo V2 is it slower?

ozkanyazgan61 · May 1, 2024, 3:46pm

Hi All,
I am using GPT 4 API with power automate. yesterday it was working fine. Now with GPT 4 Turbo it is not returning response in 2 minutes and my microsoft co-pilot timeouts. Did they slowed it down? I didn’t change anything. Now even I use v1 can’t return on time. Playground works fine.
Any ideas? After 100 do until still it is in progress

_j · May 1, 2024, 7:49pm

The assistants API “version” should have little effect on speed, although retrieval search does have to wait for the AI to write a second command, a rewritten text, and then embeddings has to generate a vector for the semantic search if invoked. Then other ways a particular request could get stuck in a loop.

Model

4-17:

For 5 trials of gpt-4-turbo @ 2024-04-17 08:49PM:

Stat Minimum Maximum Average

stream rate Min: 23.3 Max: 42.8 Avg: 29.160

latency (s) Min: 0.6019 Max: 0.9499 Avg: 0.742

total response (s) Min: 6.5735 Max: 11.7735 Avg: 9.899

total rate Min: 21.744 Max: 38.944 Avg: 26.994

response tokens Min: 256 Max: 256 Avg: 256.000

Right now:

For 3 trials of gpt-4-turbo @ 2024-05-01 12:36PM:

Stat	Minimum	Maximum	Average
stream rate	Min: 13.9	Max: 25.6	Avg: 18.967
latency (s)	Min: 0.7176	Max: 2.5229	Avg: 1.360
total response (s)	Min: 12.498	Max: 19.1759	Avg: 15.693
total rate	Min: 13.35	Max: 20.483	Avg: 16.816
response tokens	Min: 256	Max: 256	Avg: 256.000

29 → 19 = slower

As low as 850 tokens/minute

Assistants adds a second layer of doing other stuff before you get a response.

And then if under high load, it may be low tier users that get affected by some priority system, instead of a straight up rate limit that was turned on like a switch when they first introduced the tier system.

other gpt-4-turbo models:

For 3 trials of gpt-4-1106-preview @ 2024-05-01 12:51PM:

Stat	Minimum	Maximum	Average
stream rate	Min: 17.0	Max: 23.3	Avg: 21.000

For 3 trials of gpt-4-0125-preview @ 2024-05-01 12:53PM:

Stat	Minimum	Maximum	Average
stream rate	Min: 10.6	Max: 26.6	Avg: 21.233

Topic		Replies	Views
Gpt-4-0125-preview is slower than gpt-4-0613? Feedback gpt-4 , api	5	5564	January 30, 2024
Gpt-4-0125-preview INCREDIBLY slower than 3.5 turbo API	12	9569	July 22, 2024
Is gpt4 turbo preview now slower than gpt 4? API gpt-4 , gpt-4-turbo	3	8542	January 23, 2024
GPT 4 API is Very Slow Still API gpt-4 , chatgpt , api	15	6756	December 16, 2023
Gpt-4-1106-preview get slow API gpt-4	5	3429	February 26, 2024

Stat	Minimum	Maximum	Average
stream rate	Min: 23.3	Max: 42.8	Avg: 29.160
latency (s)	Min: 0.6019	Max: 0.9499	Avg: 0.742
total response (s)	Min: 6.5735	Max: 11.7735	Avg: 9.899
total rate	Min: 21.744	Max: 38.944	Avg: 26.994
response tokens	Min: 256	Max: 256	Avg: 256.000