What is the context window of the the new GPT 3.5 Turbo model (gpt-3.5-turbo-0125)?

jonathancardoso · February 1, 2024, 12:40am

The recent announcement mentioned a new GPT 3.5 Turbo model (gpt-3.5-turbo-0125). Qs:

Is it a replacement for gpt-3.5-turbo-1106?
Does it have 16k context window?

_j · February 1, 2024, 3:51am

It is supposed to be a replacement for the flaws encountered in writing functions, multilingual encoding, poor instruction following, denials, and general inability to fulfill tasks in this model:

Model	Description	Context window	Training data
gpt-3.5-turbo-1106	The latest GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens. Learn more.	16,385 tokens	Up to Sep 2021

So the specifications are expected to be the same.

SomebodySysop · February 3, 2024, 12:37pm

You sure about that? With gpt-3.5-turbo alias I am getting: “Error occurred (getChatCompletionOpenAI): This model’s maximum context length is 4097 tokens. However, your messages resulted in 4140 tokens. Please reduce the length of the messages.”

_j · February 3, 2024, 12:44pm

gpt-3.5-turbo as an alias has never left being directed to gpt-3.5-turbo-0613 as the stable model.

The schedule for re-pointing to -0125 is “two weeks after release” according to the blog.

The API return will tell you what model you are getting the response from.

The error will tell you what input context length you exceeded.

openai.BadRequestError: Error code: 400 - {‘error’: {‘message’: “This model’s maximum context length is 16385 tokens. However, your messages resulted in 18565 tokens (18478 in the messages, 87 in the functions). Please reduce the length of the messages or functions.”, ‘type’: ‘invalid_request_error’, ‘param’: ‘messages’, ‘code’: ‘context_length_exceeded’}}

or what output limitations have been placed making it not a -16k model replacement:

openai.BadRequestError: Error code: 400 - {‘error’: {‘message’: ‘max_tokens is too large: 4500. This model supports at most 4096 completion tokens, whereas you provided 4500.’, ‘type’: ‘invalid_request_error’, ‘param’: ‘max_tokens’, ‘code’: None}}

BackCode · February 10, 2024, 2:51am

I explicitly set the model to ‘gpt-3.5-turbo-0125’ and on the page you show (https://platform.openai.com/docs/models/gpt-3-5-turbo) , the docs quote a 16k window, but it will only accept 4k tokens.

_j · February 10, 2024, 4:53am

Solution: set max_tokens to 2000. Not a big number.

The API parameter max_tokens sets the maximum response length. It is the parameter that I set to 4500 tokens to get the second error shown. OpenAI doesn’t let the newest AI models write more than that.

max_tokens reserves part of the model context length solely for the output (by formula). It does not have anything to do with the total amount you can send to the remaining context window length.

BackCode · February 11, 2024, 12:21am

So unlike the gpt-3.5-turbo-16k model, I have to set max_tokens in my api call - to something less than what the model already knows is the max tokens? I have a large input, and don’t generate much for the output in my use case. I’ve never explicitly set max_tokens when using the -16k models - thus my confusion I think.

_j · February 11, 2024, 2:04am

I’ll make it really clear, in code you can run also in Python 3.8-3.11 with openai library:

from openai import OpenAI
client = OpenAI()

params = {
  "model":"gpt-3.5-turbo-0125", "max_tokens":1,
  "messages":[{"role": "system", "content": "Say hello back."}]
  }

for _ in range(80):  # add assistant + user 80 times
    params['messages'].extend([
        {"role": "assistant", "content": "@!" * 45},
        {"role": "user", "content": "Hello!"},
        ])  # 100 tokens total

completion = client.chat.completions.create(**params)

print(completion.choices[0].message.content)
print(completion.usage.model_dump())

Response:

Hello
{'completion_tokens': 1, 'prompt_tokens': 8011, 'total_tokens': 8012}

Analysis

I sent 8000 tokens besides the system message and its overhead.
I got a report back showing the prompt tokens of my input.
I got a report back showing the response was truncated at just 1 token, the output limit I set with max_tokens.
Conclusion: I can send the big input without error to gpt-3.5-turbo-0125.

(Actually I had to work around OpenAI being a jerk about testing)

openai.BadRequestError: Error code: 400 - {‘error’: {‘message’: “Sorry! We’ve encountered an issue with repetitive patterns in your prompt. Please try again with a different prompt.”, ‘type’: ‘invalid_request_error’, ‘param’: ‘prompt’, ‘code’: None}}

The AI model context length is 16384.

If I set the response to max_tokens: 1000, that would be plenty of room for the AI to write the rest of its “how can I help you today?” without it being cut off at the first token.

I would then only have 15384 tokens remaining to send input because max_tokens also “reserves” the amount from the context length.

Topic		Replies	Views
Gpt-3.5-turbo-1106 has a 16k context windown but get max token error API gpt-35-turbo	1	2692	November 9, 2023
Test new 128k window on gpt-4-1106-preview API	29	18361	February 6, 2024
Gpt-4-1106-preview 16385 max context tokens? (not output, total) API gpt-4	2	3177	December 12, 2023
GPT-4 128K only has 4096 completion tokens API gpt-4	9	27166	February 27, 2024
Finetuned gpt-3.5-turbo-0125 has a 4k context window, instead of the 16k promised API	5	1336	March 27, 2024

What is the context window of the the new GPT 3.5 Turbo model (gpt-3.5-turbo-0125)?

Response:

Analysis

Related topics