Gpt-4-0125-preview is slower than gpt-4-0613?

daniel.yakubov · January 30, 2024, 8:07pm

Hi all,

Recently, I’ve been comparing GPT-4 and the new preview turbo model and, in a small-scale test, I’ve found that the turbo model is noticeably slower than GPT-4-0613 (~12 tokens per second vs. 9 tokens per second). I am assuming this has to do with servers? Besides one dead thread here, I am not finding much information on the issue.

Perhaps a relevant detail is that I am querying in JSON mode.

_j · January 30, 2024, 8:32pm

A quick speed test to find latency (1-token response time), and 128 and 512 token response times and rate over that total time.

My existing speed test document creation is now refused. Thanks OpenAI.

—gpt-4-0613—
Sorry
[1 tokens in 0.5s. 1.9 tps]
Sorry, but I can’t assist with that.
[10 tokens in 1.1s. 8.8 tps]
I’m sorry, but I can’t assist with that.
[12 tokens in 1.3s. 9.4 tps]

So more tokens wasted on prompting obedience…

—gpt-4-0613—
Title
[1 tokens in 1.3s. 0.8 tps]
Title: Digital Transformation: A Comprehensive Guide

Introduction

Digital tran
[128 tokens in 9.0s. 14.2 tps]
Title: Digital Transformation: A Comprehensive Exploration

Introduction

Digita
[512 tokens in 59.7s. 8.6 tps]

—gpt-4-turbo-preview—
#
[1 tokens in 2.8s. 0.4 tps]
# The Comprehensive Guide to Digital Transformation: Navigating the Future of Bu
[128 tokens in 9.9s. 13.0 tps]
# The Comprehensive Guide to Digital Transformation: Navigating the Future of Bu
[512 tokens in 44.4s. 11.5 tps]

So somewhat comparable. Speed also has to do with balancing the number of instances vs users calling the model (catch the beta on release day, and you see what it can do). It takes a bunch of testing to see where the production might max out, as a real test of the model production rate capabilities and best machines that it is deployed on.

(My scripting to run more extensive tests is inside a PC killed by power surges).

duncanhaywood · January 30, 2024, 10:18pm

We were also having potential problems with streaming on the 0125 versus the 1106 – I thought it was possibly because it’s a recent preview release, so the servers for the model might be overloaded.

Has anyone else run into this?

Diet · January 30, 2024, 10:20pm

how long is your context?

we’ve observed that longer context sometimes takes more time to first token.

PaulBellow · January 30, 2024, 10:20pm

I’ve got some large-token prompts that are taking 1 to 5 minutes on 0125 … I’m thinking it’s the load… I’m Tier 5 for rate-limiting…

lachie1 · January 30, 2024, 10:26pm

Recently there have been stability issues for API but I would assume the preview model would have less resources.

Topic		Replies	Views
Gpt-4-0125-preview INCREDIBLY slower than 3.5 turbo API	12	9598	July 22, 2024
Is gpt4 turbo preview now slower than gpt 4? API gpt-4 , gpt-4-turbo	3	8550	January 23, 2024
Unstable speed of gpt-3.5-turbo-16k API api , gpt-35-turbo-16k , performance	6	1110	January 9, 2024
GP4-Turbo V2 is it slower? Feedback gpt-4	1	277	May 1, 2024
GPT 4 API is Very Slow Still API gpt-4 , chatgpt , api	15	6806	December 16, 2023

Gpt-4-0125-preview is slower than gpt-4-0613?

Related topics