The recent announcement mentioned a new GPT 3.5 Turbo model (gpt-3.5-turbo-0125). Qs:
- Is it a replacement for gpt-3.5-turbo-1106?
- Does it have 16k context window?
The recent announcement mentioned a new GPT 3.5 Turbo model (gpt-3.5-turbo-0125). Qs:
It is supposed to be a replacement for the flaws encountered in writing functions, multilingual encoding, poor instruction following, denials, and general inability to fulfill tasks in this model:
Model | Description | Context window | Training data |
---|---|---|---|
gpt-3.5-turbo-1106 | The latest GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens. Learn more. | 16,385 tokens | Up to Sep 2021 |
So the specifications are expected to be the same.
You sure about that? With gpt-3.5-turbo alias I am getting: āError occurred (getChatCompletionOpenAI): This modelās maximum context length is 4097 tokens. However, your messages resulted in 4140 tokens. Please reduce the length of the messages.ā
gpt-3.5-turbo
as an alias has never left being directed to gpt-3.5-turbo-0613
as the stable model.
The schedule for re-pointing to -0125 is ātwo weeks after releaseā according to the blog.
The API return will tell you what model you are getting the response from.
The error will tell you what input context length you exceeded.
openai.BadRequestError: Error code: 400 - {āerrorā: {āmessageā: āThis modelās maximum context length is 16385 tokens. However, your messages resulted in 18565 tokens (18478 in the messages, 87 in the functions). Please reduce the length of the messages or functions.ā, ātypeā: āinvalid_request_errorā, āparamā: āmessagesā, ācodeā: ācontext_length_exceededā}}
or what output limitations have been placed making it not a -16k model replacement:
openai.BadRequestError: Error code: 400 - {āerrorā: {āmessageā: āmax_tokens is too large: 4500. This model supports at most 4096 completion tokens, whereas you provided 4500.ā, ātypeā: āinvalid_request_errorā, āparamā: āmax_tokensā, ācodeā: None}}
I explicitly set the model to āgpt-3.5-turbo-0125ā and on the page you show (https://platform.openai.com/docs/models/gpt-3-5-turbo) , the docs quote a 16k window, but it will only accept 4k tokens.
Solution: set max_tokens to 2000. Not a big number.
The API parameter max_tokens sets the maximum response length. It is the parameter that I set to 4500 tokens to get the second error shown. OpenAI doesnāt let the newest AI models write more than that.
max_tokens reserves part of the model context length solely for the output (by formula). It does not have anything to do with the total amount you can send to the remaining context window length.
So unlike the gpt-3.5-turbo-16k model, I have to set max_tokens in my api call - to something less than what the model already knows is the max tokens? I have a large input, and donāt generate much for the output in my use case. Iāve never explicitly set max_tokens when using the -16k models - thus my confusion I think.
Iāll make it really clear, in code you can run also in Python 3.8-3.11 with openai
library:
from openai import OpenAI
client = OpenAI()
params = {
"model":"gpt-3.5-turbo-0125", "max_tokens":1,
"messages":[{"role": "system", "content": "Say hello back."}]
}
for _ in range(80): # add assistant + user 80 times
params['messages'].extend([
{"role": "assistant", "content": "@!" * 45},
{"role": "user", "content": "Hello!"},
]) # 100 tokens total
completion = client.chat.completions.create(**params)
print(completion.choices[0].message.content)
print(completion.usage.model_dump())
Hello
{'completion_tokens': 1, 'prompt_tokens': 8011, 'total_tokens': 8012}
I sent 8000 tokens besides the system message and its overhead.
I got a report back showing the prompt tokens
of my input.
I got a report back showing the response was truncated at just 1 token, the output limit I set with max_tokens
.
Conclusion: I can send the big input without error to gpt-3.5-turbo-0125
.
openai.BadRequestError: Error code: 400 - {āerrorā: {āmessageā: āSorry! Weāve encountered an issue with repetitive patterns in your prompt. Please try again with a different prompt.ā, ātypeā: āinvalid_request_errorā, āparamā: āpromptā, ācodeā: None}}
The AI model context length is 16384.
If I set the response to max_tokens: 1000
, that would be plenty of room for the AI to write the rest of its āhow can I help you today?ā without it being cut off at the first token.
I would then only have 15384 tokens remaining to send input because max_tokens also āreservesā the amount from the context length.