What is the context window of the the new GPT 3.5 Turbo model (gpt-3.5-turbo-0125)?

The recent announcement mentioned a new GPT 3.5 Turbo model (gpt-3.5-turbo-0125). Qs:

  • Is it a replacement for gpt-3.5-turbo-1106?
  • Does it have 16k context window?

It is supposed to be a replacement for the flaws encountered in writing functions, multilingual encoding, poor instruction following, denials, and general inability to fulfill tasks in this model:

Model Description Context window Training data
gpt-3.5-turbo-1106 The latest GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens. Learn more. 16,385 tokens Up to Sep 2021

So the specifications are expected to be the same.

1 Like

You sure about that? With gpt-3.5-turbo alias I am getting: ā€œError occurred (getChatCompletionOpenAI): This modelā€™s maximum context length is 4097 tokens. However, your messages resulted in 4140 tokens. Please reduce the length of the messages.ā€

gpt-3.5-turbo as an alias has never left being directed to gpt-3.5-turbo-0613 as the stable model.

The schedule for re-pointing to -0125 is ā€œtwo weeks after releaseā€ according to the blog.

The API return will tell you what model you are getting the response from.

The error will tell you what input context length you exceeded.

openai.BadRequestError: Error code: 400 - {ā€˜errorā€™: {ā€˜messageā€™: ā€œThis modelā€™s maximum context length is 16385 tokens. However, your messages resulted in 18565 tokens (18478 in the messages, 87 in the functions). Please reduce the length of the messages or functions.ā€, ā€˜typeā€™: ā€˜invalid_request_errorā€™, ā€˜paramā€™: ā€˜messagesā€™, ā€˜codeā€™: ā€˜context_length_exceededā€™}}

or what output limitations have been placed making it not a -16k model replacement:

openai.BadRequestError: Error code: 400 - {ā€˜errorā€™: {ā€˜messageā€™: ā€˜max_tokens is too large: 4500. This model supports at most 4096 completion tokens, whereas you provided 4500.ā€™, ā€˜typeā€™: ā€˜invalid_request_errorā€™, ā€˜paramā€™: ā€˜max_tokensā€™, ā€˜codeā€™: None}}

1 Like

I explicitly set the model to ā€˜gpt-3.5-turbo-0125ā€™ and on the page you show (https://platform.openai.com/docs/models/gpt-3-5-turbo) , the docs quote a 16k window, but it will only accept 4k tokens.
:slightly_frowning_face:

Solution: set max_tokens to 2000. Not a big number.

The API parameter max_tokens sets the maximum response length. It is the parameter that I set to 4500 tokens to get the second error shown. OpenAI doesnā€™t let the newest AI models write more than that.

max_tokens reserves part of the model context length solely for the output (by formula). It does not have anything to do with the total amount you can send to the remaining context window length.

So unlike the gpt-3.5-turbo-16k model, I have to set max_tokens in my api call - to something less than what the model already knows is the max tokens? I have a large input, and donā€™t generate much for the output in my use case. Iā€™ve never explicitly set max_tokens when using the -16k models - thus my confusion I think.

Iā€™ll make it really clear, in code you can run also in Python 3.8-3.11 with openai library:

from openai import OpenAI
client = OpenAI()

params = {
  "model":"gpt-3.5-turbo-0125", "max_tokens":1,
  "messages":[{"role": "system", "content": "Say hello back."}]
  }

for _ in range(80):  # add assistant + user 80 times
    params['messages'].extend([
        {"role": "assistant", "content": "@!" * 45},
        {"role": "user", "content": "Hello!"},
        ])  # 100 tokens total

completion = client.chat.completions.create(**params)

print(completion.choices[0].message.content)
print(completion.usage.model_dump())

Response:

Hello
{'completion_tokens': 1, 'prompt_tokens': 8011, 'total_tokens': 8012}

Analysis

  • I sent 8000 tokens besides the system message and its overhead.

  • I got a report back showing the prompt tokens of my input.

  • I got a report back showing the response was truncated at just 1 token, the output limit I set with max_tokens.

  • Conclusion: I can send the big input without error to gpt-3.5-turbo-0125.

(Actually I had to work around OpenAI being a jerk about testing)

openai.BadRequestError: Error code: 400 - {ā€˜errorā€™: {ā€˜messageā€™: ā€œSorry! Weā€™ve encountered an issue with repetitive patterns in your prompt. Please try again with a different prompt.ā€, ā€˜typeā€™: ā€˜invalid_request_errorā€™, ā€˜paramā€™: ā€˜promptā€™, ā€˜codeā€™: None}}

The AI model context length is 16384.

If I set the response to max_tokens: 1000, that would be plenty of room for the AI to write the rest of its ā€œhow can I help you today?ā€ without it being cut off at the first token.

I would then only have 15384 tokens remaining to send input because max_tokens also ā€œreservesā€ the amount from the context length.

2 Likes