Maximum content length exceeded despite prompt being very simple

Hey guys,
I have been getting this error: Error code: 400 - {‘error’: {‘message’: “This model’s maximum context length is 16385 tokens. However, your messages resulted in 44366 tokens. Please reduce the length of the messages.”, ‘type’: ‘invalid_request_error’, ‘param’: ‘messages’, ‘code’: ‘context_length_exceeded’}}.

This happens even if the prompt is just “hello” or “answer in less than 30 words”, and what is also weird is i have already set max_tokens=4000 but it says maximum context length is 16385 tokens. for some context, the model is gpt-3.5-turbo, im using langchain and pydantic 2.9.2 since the current version was causing me some errors. This is my first time using openai API so im not sure what could be causing this, feel free to ask for more information.

1 Like

Hi,

Welcome to the forum.

While gpt-4o-mini may return the same error it also may not as 4o has a higher context window, it will also cost less in testing and return results that fail less overall.

Also there are multiple parameters sent I think a System and a User Message maybe…

The simple ‘Hello’ prompt, does it have any instructions? A ‘System’ or ‘Developer’ message?

https://platform.openai.com/docs/guides/text-generation#developer-messages

I cant help specifically with langchain or pydantic but hopefully this helps you find a bit more direction :slight_smile:

1 Like

the hello promt did not have any instructions. i have also tried “how many customers are in the customer table?” wich also seemed somewhat simple and the result was the same, and the role is not specified on the code. As for the other model, i could try that but it would not solve the problem of the prompt taking up too many tokens for what it is, right?

2 Likes

Trying the other model would rule out that you are sending 44k tokens because the input window of gpt-4o-mini is 128k, it would also fix your next problem :slight_smile:

2 Likes

Are you using some langchain RAG thing? It’s likely that langchain is inserting a bunch of stuff into your context.

Does langchain have a debug mechanism that lets you inspect the prompt before it gets sent to OpenAI?

There’s another bug where you can get a runaway generation, but I don’t think this is what’s happening here.

1 Like

im using langchain with openai to make a chatbot that answers certain questions from a local database. im not sure if this answers the question but langchain does have a property on one function that lets me see the prompt and the answer

Entering new SQLDatabaseChain chain…

Dada una pregunta del usuario:

  1. Crea una consulta SQL válida.
  2. Revisa los resultados.
  3. Devuelve el dato en español.
  4. Si necesitas aclaraciones adicionales, inclúyelas en el resultado.
    Pregunta: dame los nombres de las tablas

SQLQuery:

the list would be the full prompt, “pregunta” is the query, and SQLQuery should show the query gpt generates, but it shows blank

1 Like

Yeah you may need to ask around in the langchain communities for help if you can’t debug it yourself - you need to figure out a way to print the prompt sent to the OpenAI api. My guess is that it’s overpopulated. If it’s not, we can take the prompt and try to reproduce the issue in the playground and go from there.

If you just need a simple quick fix to maybe make this part more likely to work, you might want to consider simply picking a longer context model like 4o or something (unless you’re actually using gpt-3.5-turbo-instruct, and not gpt-3.5-turbo)

2 Likes

You should debug to see the actual request that is created.
If you have pydantic /langchain - you might have much more content in the object than you need and the auto ‘dump’ might dump much more into the prompt than you expected.

Note that max_tokens is for the setting the output.

2 Likes