Gpt-4-1106-preview: 400 This model's maximum context length is 4097 tokens


We have a web application which calls gpt-4-1106-preview with stream: true in the backend. For instance,

    const input = {
        model: "gpt-4-1106-preview",
        messages: [{ role: 'user', content: "I have a long text to show you..." }],
        stream: true,

    const stream = await;

We often receive error messages like 400 This model's maximum context length is 4097 tokens. However, your messages resulted in 16727 tokens. Please reduce the length of the messages.

But should not gpt-4-1106-preview accept 128k context lengths? Does anyone know how we could increase the maximum context length and avoid the 400 error?

Thank you

If you are indeed specifying the correct AI model, this may be a case of an incorrect error message evoked by specifying exactly the wrong value of parameter.

I would look first at max_tokens that you are using. That is the response length reservation in tokens. It is NOT telling the AI its own context window.

A good maximum is about 1500 tokens, the most you will get out of the model unless doing specific data processing tasks.

The maximum output this AI model can be set to is 4k.

I just added some code to the OP. Usually I didn’t specify max_tokens.

I would like my backend to be able to accept long requests.

Do you know what’s the maximum context length that gpt-4-1106-preview can accept?

gpt-4-1106-preview has a context length of 128000 tokens, technically 125k.

$1.25 + $0.09 for some output.


1 Like

So what should i do to avoid such 400 errors? If I set for instance max_tokens: 8192, we will have less 400 errors than before?

Thank you

As described earlier, you cannot set max_tokens above 4000, the maximum output this AI model allows, despite its input context length.

See above where it says “a good maximum”

1 Like

Could I ask you if you found the solution how to deal with very long prompts and this model?

You can send VERY large $1 inputs to the model no problem.

The issue that was faced was not understanding that:

  • the max_tokens setting is only for the size of the response; it doesn’t correspond to the total context length of the model that you want to use or relate to what you send (except for subtracting from the available space);
  • the gpt-4-turbo models have an artificial limitation of 4k maximum output despite their large context that would make one think they could produce longer answers.


  • Ask for a reasonable max_tokens like 2000 - that prevents billing overages if the model goes crazy.

  • Send up to 126000 tokens of input - if you want to pay for it - and hope the AI can pay attention to all of it at once.

Is this the exact code resulting in the error apart from the message placeholder?

Are you specifying max_tokens in the requests that result in this error?