Gpt-4-1106-preview: 400 This model's maximum context length is 4097 tokens

SoftTimur · January 3, 2024, 2:05pm

Hello,

We have a web application which calls gpt-4-1106-preview with stream: true in the backend. For instance,

    const input = {
        model: "gpt-4-1106-preview",
        messages: [{ role: 'user', content: "I have a long text to show you..." }],
        stream: true,
    }

    const stream = await openai.chat.completions.create(input);

We often receive error messages like 400 This model's maximum context length is 4097 tokens. However, your messages resulted in 16727 tokens. Please reduce the length of the messages.

But should not gpt-4-1106-preview accept 128k context lengths? Does anyone know how we could increase the maximum context length and avoid the 400 error?

Thank you

_j · January 3, 2024, 2:27pm

If you are indeed specifying the correct AI model, this may be a case of an incorrect error message evoked by specifying exactly the wrong value of parameter.

I would look first at max_tokens that you are using. That is the response length reservation in tokens. It is NOT telling the AI its own context window.

A good maximum is about 1500 tokens, the most you will get out of the model unless doing specific data processing tasks.

The maximum output this AI model can be set to is 4k.

SoftTimur · January 3, 2024, 2:38pm

I just added some code to the OP. Usually I didn’t specify max_tokens.

I would like my backend to be able to accept long requests.

Do you know what’s the maximum context length that gpt-4-1106-preview can accept?

_j · January 3, 2024, 4:27pm

gpt-4-1106-preview has a context length of 128000 tokens, technically 125k.

$1.25 + $0.09 for some output.

calculator: https://tiktokenizer.vercel.app/

SoftTimur · January 3, 2024, 4:49pm

So what should i do to avoid such 400 errors? If I set for instance max_tokens: 8192, we will have less 400 errors than before?

Thank you

_j · January 3, 2024, 5:03pm

As described earlier, you cannot set max_tokens above 4000, the maximum output this AI model allows, despite its input context length.

See above where it says “a good maximum”

raulblanko · March 18, 2024, 5:57pm

Could I ask you if you found the solution how to deal with very long prompts and this model?

_j · March 18, 2024, 6:18pm

You can send VERY large $1 inputs to the model no problem.

The issue that was faced was not understanding that:

the max_tokens setting is only for the size of the response; it doesn’t correspond to the total context length of the model that you want to use or relate to what you send (except for subtracting from the available space);
the gpt-4-turbo models have an artificial limitation of 4k maximum output despite their large context that would make one think they could produce longer answers.

Solution:

Ask for a reasonable max_tokens like 2000 - that prevents billing overages if the model goes crazy.
Send up to 126000 tokens of input - if you want to pay for it - and hope the AI can pay attention to all of it at once.

sps · March 18, 2024, 6:59pm

Is this the exact code resulting in the error apart from the message placeholder?

Are you specifying max_tokens in the requests that result in this error?

Topic		Replies	Views
Maximum Context Length Error across different models API	3	3293	December 4, 2023
GPT-4o Context Length Issue: Input Tokens Within Limit but Exceeds Maximum API	3	1712	February 1, 2025
GPT-4o Context Window is 128K but Getting error model's maximum context length is 8192 tokens, however you requested 21026 tokens API	9	9279	October 21, 2024
Token Limitization Error when prompting Prompting chatgpt , api	8	3370	December 6, 2023
Gpt-4-1106-preview 16385 max context tokens? (not output, total) API gpt-4	2	3225	December 12, 2023

Gpt-4-1106-preview: 400 This model's maximum context length is 4097 tokens

Related topics