Hi all,
I’m facing a persistent issue with OpenAI’s 8192-token limit (using gpt-4o-mini
), and I want to proactively handle the token limit error before it happens — not just catch it after it throws. Here’s the specific error:
BadRequestError: 400 This model’s maximum context length is 8192 tokens, however you requested 8342 tokens (8342 in your prompt; 0 for the completion).
My Use Case
I’m building a custom AI QA feature using LangChain and MongoDB vector search. The call flow looks like this:
createQACompletion → _invokeAgent → _callDefaultAgent
The issue arises specifically in _callDefaultAgent
, inside the similaritySearch()
method:
const similaritySearch = await vectorStore.similaritySearch(queryObj.text, 5, {
preFilter: {
businessId: {"$eq": self.businessId},
type: {"$in": ["businessQuestions", "plan"]}
}
});
After similarity search, I join the results + some context:
const context = [
await self._getBusinessDetailContext(vectorStore, queryObj.text),
...similaritySearch.map(doc => doc.pageContent)
].filter(Boolean).join("\n\n");
Then I build the RunnableSequence
chain and call chain.invoke(queryObj.text)
.
Problem
- Sometimes the similarity search + context exceeds the 8192 token limit.
- The error kills the flow right at the
similaritySearch
orchain.invoke()
stage. - I already tried counting tokens of
queryObj.text
, but that’s insufficient because I don’t know the token count after context is built until it’s too late.
What I Want
I don’t want to “ignore” or “check for specific strings” in the error message. I want a solid, proactive way to:
- Either calculate the token count before calling
chain.invoke()
, including context + prompt + question, - Also I don’t want the other tried solutions like truncation and all. i just want to handle the error
What I’ve Tried
- Counting tokens of
queryObj.text
- Logging token usage in
handleLLMEnd()
- Using LangChain callbacks
- But the error still occurs before that gets hit, during similarity search.
Relevant Code Summary
const vectorStore = new MongoDBAtlasVectorSearch(...);
const similaritySearch = await vectorStore.similaritySearch(...); // 🔥 Error likely starts here
const context = [
await self._getBusinessDetailContext(...),
...similaritySearch.map(doc => doc.pageContent)
].filter(Boolean).join("\n\n");
const chain = RunnableSequence.from([
{ context: () => context, question: new RunnablePassthrough() },
promptTemplate,
chatModel
]);
const answer = await chain.invoke(queryObj.text); // 🔥 Throws 8192+ token error
Would really appreciate guidance or code suggestions. Thank you!