Seem to be unable to reach context limit in my API request

rene-ka · June 16, 2024, 5:46pm

I was trying to see how my app would behave when I exceeded the context limit. I tried this:

 const initialMessageContent = `Some text "${longText}". 
    Also please consider this "${longText}". 
    Also please consider this "${longText}".
        Also please consider this "${longText}". 
    Also please consider this "${longText}".
        Also please consider this "${longText}". 
    Also please consider this "${longText}".
        Also please consider this "${longText}". 
    Also please consider this "${longText}".
   Thank you.`;

    console.log('Intial Message Lengths', initialMessageContent.length )

  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    stream: true,
    messages: [
      {
        role: 'user',
        content: initialMessageContent
      },
    ]
  });

I got a console log: “Intial Message Lengths 308153”
But no error, it just seemed to process it. Would have expected to get an error.

Can become a bit costly to experiment with this , so don’t want to dive deeper. Anyone any thoughts why I am not getting an error?

suspicious_cow · June 16, 2024, 6:24pm

I can’t speak as to why you didn’t get an error but I know that if I need to restrict tokens that I have leveraged the max_tokens parameter for this purpose: https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_tokens

rene-ka · June 16, 2024, 7:36pm

Thanks. Just to confirm my understandning:

max_tokens is only limiting the output only, right?

If I set max_tokens: 10 and my request prompt has 1000 tokens it will still process it, right?

wclayf · June 17, 2024, 5:05am

You can use a Python library like Tiktoken (check my spelling on that) perhaps to calculate the number of tokens yourself and then just trust that OpenAI is telling the truth about their supported context window size, and enforce it on your end. Or do what I did, and just do a word count and then multiply that by a reasonable approximate average number of tokens per word, and then enforce it without being totally precise just by counting words, or character length. There’s no real point in trying to force OpenAI to throw an error when you know how to stay within the limits, imo.

jr.2509 · June 17, 2024, 5:23am

Your understanding is correct. The max_tokens parameter only applies to output tokens. It works as a hard cut-off whereby the output is cut at the number of tokens specified as limit, independent of whether the output is complete at that point.

As for input tokens, there is no equivalent parameter, and I’d echo @wclayf’s recommended approach.

It’s interesting though that it doesn’t throw an error in your case. For other models I have not experienced this before.

vb · June 17, 2024, 8:01am

Hi!
When the context window is exceeded the first tokens will be dropped, only the last tokens that fit into the window will remain in context.

You can test this by asking for specific information from the beginning of the context and then increase the number of input tokens until the model can’t see or derive what those initial tokens should be.

As @wclayf correctly pointed out there is not really a use for this that can’t be handledin a better way.
Or is there something else you are looking for besides a warning from the API?

rene-ka · June 17, 2024, 6:26pm

Thanks for clarification. I understand this can be handled by the developer/user of the AI.

But maybe a warning would be useful that indicates that the request prompt was in fact cut-off / effectively “modified” before giving a response, could otherwise lead to unexpected behaviour.

vb · June 17, 2024, 6:47pm

I thought about this argument because why not just give a warning to the developer that the context length is exceeded?

However, the behavior that follows is rather clearly defined: ‘Drop the first tokens, process whatever is inside the context window,’ and not seemingly random in the sense of how ‘undefined behavior’ is often used.

This example is a very basic requirement for developers working with current-generation LLMs. In my opinion, this is a must to be aware of when building with these costly tools.

suspicious_cow · June 17, 2024, 7:10pm

Sorry for the late reply @rene-ka

I see others have jumped in and gotten you up to speed. Token counting can be tricky as has been noted and you may want to create a custom solution but max_tokens can be a stop gap at the very least.

rene-ka · June 17, 2024, 8:37pm

I guess this is a different debate to have whether it would be useful to have a warning. I wrote “unexpected behaviour”… But I guess a different debate. Thanks for your reply and answering my question.

Topic		Replies	Views
4096 response limit vs 128 000 context window API	11	15401	February 6, 2025
Not allowed to have all 8192 tokens API gpt-4	16	12997	December 18, 2023
GPT-4o Context Length Issue: Input Tokens Within Limit but Exceeds Maximum API	3	3864	February 1, 2025
Gpt-4-1106-preview: 400 This model's maximum context length is 4097 tokens API api , token , gpt-4-turbo	8	5815	March 18, 2024
Help Needed: Tackling Context Length Limits in OpenAI Models Community gpt-4 , chatgpt , token , rate-limit , openai	8	20154	February 8, 2024

Seem to be unable to reach context limit in my API request

Related topics