I was trying to see how my app would behave when I exceeded the context limit. I tried this:
const initialMessageContent = `Some text "${longText}".
Also please consider this "${longText}".
Also please consider this "${longText}".
Also please consider this "${longText}".
Also please consider this "${longText}".
Also please consider this "${longText}".
Also please consider this "${longText}".
Also please consider this "${longText}".
Also please consider this "${longText}".
Thank you.`;
console.log('Intial Message Lengths', initialMessageContent.length )
const response = await openai.chat.completions.create({
model: 'gpt-4o',
stream: true,
messages: [
{
role: 'user',
content: initialMessageContent
},
]
});
I got a console log: âIntial Message Lengths 308153â
But no error, it just seemed to process it. Would have expected to get an error.
Can become a bit costly to experiment with this , so donât want to dive deeper. Anyone any thoughts why I am not getting an error?
You can use a Python library like Tiktoken (check my spelling on that) perhaps to calculate the number of tokens yourself and then just trust that OpenAI is telling the truth about their supported context window size, and enforce it on your end. Or do what I did, and just do a word count and then multiply that by a reasonable approximate average number of tokens per word, and then enforce it without being totally precise just by counting words, or character length. Thereâs no real point in trying to force OpenAI to throw an error when you know how to stay within the limits, imo.
Your understanding is correct. The max_tokens parameter only applies to output tokens. It works as a hard cut-off whereby the output is cut at the number of tokens specified as limit, independent of whether the output is complete at that point.
As for input tokens, there is no equivalent parameter, and Iâd echo @wclayfâs recommended approach.
Itâs interesting though that it doesnât throw an error in your case. For other models I have not experienced this before.
Hi!
When the context window is exceeded the first tokens will be dropped, only the last tokens that fit into the window will remain in context.
You can test this by asking for specific information from the beginning of the context and then increase the number of input tokens until the model canât see or derive what those initial tokens should be.
As @wclayf correctly pointed out there is not really a use for this that canât be handledin a better way.
Or is there something else you are looking for besides a warning from the API?
Thanks for clarification. I understand this can be handled by the developer/user of the AI.
But maybe a warning would be useful that indicates that the request prompt was in fact cut-off / effectively âmodifiedâ before giving a response, could otherwise lead to unexpected behaviour.
I thought about this argument because why not just give a warning to the developer that the context length is exceeded?
However, the behavior that follows is rather clearly defined: âDrop the first tokens, process whatever is inside the context window,â and not seemingly random in the sense of how âundefined behaviorâ is often used.
This example is a very basic requirement for developers working with current-generation LLMs. In my opinion, this is a must to be aware of when building with these costly tools.
I see others have jumped in and gotten you up to speed. Token counting can be tricky as has been noted and you may want to create a custom solution but max_tokens can be a stop gap at the very least.
I guess this is a different debate to have whether it would be useful to have a warning. I wrote âunexpected behaviourâ⌠But I guess a different debate. Thanks for your reply and answering my question.