Error: This model's maximum context length is X tokens

Hey guys,

So I’ve recently started having this error:

“Error: This model’s maximum context length is 4097 tokens. However, your messages resulted in 5355 tokens. Please reduce the length of the messages.”

The prompt is very small. I’m using LLamaIndex. I am using OpenAIEmbeddings and chunk_size=1000. I am using a PDF which gets indexed without an issue. The problem is that I am asking a random question: “Describe to me what Ruby on Rails is” and I get this error.

However, your messages resulted in 5355 tokens. Please reduce the length of the messages." which seems odd. Is it because the completion is too big?

Maximum context length is a sum of the inputs that you send the API (inlcuding samples, prompt)
as well as the output that GPT generates and returns. Most likely in your case, it’s a completion which is too big that is causing the problem.

Hi @tavy88

AFAIK the chunk size only limits every chunk at 1000. There could be multiple chunks depending on your implementation.

This error could also occur if the conversation has previous messages as they will also consume the context length.

1 Like

Yeah, that was my suspicion as well. I guess turbo-3.5-16k would be better in this case?

If your prompt is actually small and a majority of the tokens are from the output generation, then yes.

The problem with having larger inputs for an LLM is that it has a tendency to hallucinate/ wrongly use information from the middle of a huge prompt (I’m taking about something close to a 700-1000 token input prompt from experience). If that is the case, it would be better to try and reduce the input prompt size.


To add to @udm17’s comment, large prompts are fine, if that prompt is full of contextual data, i.e., information to base the instruction upon. What it’s not good for is hundreds and hundreds of instructions one after the other.

So, data is fine, but keep the request per prompt to a few at most, talking 3 or 4 tops, 5 or more and you will start to see significant degradation in performance.