“Error: This model’s maximum context length is 4097 tokens. However, your messages resulted in 5355 tokens. Please reduce the length of the messages.”
The prompt is very small. I’m using LLamaIndex. I am using OpenAIEmbeddings and chunk_size=1000. I am using a PDF which gets indexed without an issue. The problem is that I am asking a random question: “Describe to me what Ruby on Rails is” and I get this error.
However, your messages resulted in 5355 tokens. Please reduce the length of the messages." which seems odd. Is it because the completion is too big?
Maximum context length is a sum of the inputs that you send the API (inlcuding samples, prompt)
as well as the output that GPT generates and returns. Most likely in your case, it’s a completion which is too big that is causing the problem.
If your prompt is actually small and a majority of the tokens are from the output generation, then yes.
The problem with having larger inputs for an LLM is that it has a tendency to hallucinate/ wrongly use information from the middle of a huge prompt (I’m taking about something close to a 700-1000 token input prompt from experience). If that is the case, it would be better to try and reduce the input prompt size.
To add to @udm17’s comment, large prompts are fine, if that prompt is full of contextual data, i.e., information to base the instruction upon. What it’s not good for is hundreds and hundreds of instructions one after the other.
So, data is fine, but keep the request per prompt to a few at most, talking 3 or 4 tops, 5 or more and you will start to see significant degradation in performance.