Gpt-4-1106-preview: 400 This model's maximum context length is 4097 tokens

Hello,

We have a web application which calls gpt-4-1106-preview with stream: true in the backend. For instance,

    const input = {
        model: "gpt-4-1106-preview",
        messages: [{ role: 'user', content: "I have a long text to show you..." }],
        stream: true,
    }

    const stream = await openai.chat.completions.create(input);

We often receive error messages like 400 This model's maximum context length is 4097 tokens. However, your messages resulted in 16727 tokens. Please reduce the length of the messages.

But should not gpt-4-1106-preview accept 128k context lengths? Does anyone know how we could increase the maximum context length and avoid the 400 error?

Thank you

If you are indeed specifying the correct AI model, this may be a case of an incorrect error message evoked by specifying exactly the wrong value of parameter.

I would look first at max_tokens that you are using. That is the response length reservation in tokens. It is NOT telling the AI its own context window.

A good maximum is about 1500 tokens, the most you will get out of the model unless doing specific data processing tasks.

The maximum output this AI model can be set to is 4k.

I just added some code to the OP. Usually I didn’t specify max_tokens.

I would like my backend to be able to accept long requests.

Do you know what’s the maximum context length that gpt-4-1106-preview can accept?

gpt-4-1106-preview has a context length of 128000 tokens, technically 125k.

$1.25 + $0.09 for some output.

calculator: https://tiktokenizer.vercel.app/

1 Like

So what should i do to avoid such 400 errors? If I set for instance max_tokens: 8192, we will have less 400 errors than before?

Thank you

As described earlier, you cannot set max_tokens above 4000, the maximum output this AI model allows, despite its input context length.

See above where it says “a good maximum”

1 Like