GP4-Turbo V2 is it slower?

Hi All,
I am using GPT 4 API with power automate. yesterday it was working fine. Now with GPT 4 Turbo it is not returning response in 2 minutes and my microsoft co-pilot timeouts. Did they slowed it down? I didn’t change anything. Now even I use v1 can’t return on time. Playground works fine.
Any ideas? After 100 do until still it is in progress

The assistants API “version” should have little effect on speed, although retrieval search does have to wait for the AI to write a second command, a rewritten text, and then embeddings has to generate a vector for the semantic search if invoked. Then other ways a particular request could get stuck in a loop.

Model

4-17:

For 5 trials of gpt-4-turbo @ 2024-04-17 08:49PM:

Stat Minimum Maximum Average
stream rate Min: 23.3 Max: 42.8 Avg: 29.160
latency (s) Min: 0.6019 Max: 0.9499 Avg: 0.742
total response (s) Min: 6.5735 Max: 11.7735 Avg: 9.899
total rate Min: 21.744 Max: 38.944 Avg: 26.994
response tokens Min: 256 Max: 256 Avg: 256.000

Right now:

For 3 trials of gpt-4-turbo @ 2024-05-01 12:36PM:

Stat Minimum Maximum Average
stream rate Min: 13.9 Max: 25.6 Avg: 18.967
latency (s) Min: 0.7176 Max: 2.5229 Avg: 1.360
total response (s) Min: 12.498 Max: 19.1759 Avg: 15.693
total rate Min: 13.35 Max: 20.483 Avg: 16.816
response tokens Min: 256 Max: 256 Avg: 256.000

29 → 19 = slower

As low as 850 tokens/minute

Assistants adds a second layer of doing other stuff before you get a response.

And then if under high load, it may be low tier users that get affected by some priority system, instead of a straight up rate limit that was turned on like a switch when they first introduced the tier system.

other gpt-4-turbo models:

For 3 trials of gpt-4-1106-preview @ 2024-05-01 12:51PM:

Stat Minimum Maximum Average
stream rate Min: 17.0 Max: 23.3 Avg: 21.000

For 3 trials of gpt-4-0125-preview @ 2024-05-01 12:53PM:

Stat Minimum Maximum Average
stream rate Min: 10.6 Max: 26.6 Avg: 21.233
1 Like