How can the OpenAI model's max token length error be resolved?

I’m currently working on an AI project, but I’m relatively new to the field and need some guidance. My goal is to utilize AI models, specifically OpenAI models, to generate HTML templates based on some data I have.

I explored using Langchain and a vector database. I inserted my data into the vector database as separate documents. Then, to generate templates based on a specific query, I performed a similarity search in the vector database. The retrieved documents were used as context for an OpenAI model to generate templates accordingly. However, I encountered a max token error when passing lengthy code as context to the AI model.

The context passed to the AI model contains extensive code, and I’m unsure how to handle this error. Is there a way to increase the token length or overcome this limitation?

1 Like

Hey odb23 welcome to the forum. This thread may be helpful for you Ways to automate breaking a large piece of input into chunks that fit in the 4096 token constraint?

1 Like

@odb23 Can you post the request and the exact error message? So I can help you figure it out. Normally the max_tokens param you use in API means the upper limit of the response token, and

max_tokens(max tokens in response) + tokens in request <= the max context size of the model you use


@kjordan, Thanks for your response. I am using langchain js package and pinecone vector db for this

  const template =
    "You are an expert web developer, you use HTML code to create beautiful web pages. The templates provided to you are Tailwind CSS HTML templates. Using these templates as context, you should make it better by refactoring and removing errors. Write, as best as possible at least three HTML codes for {templateQuery}";
  const queryPrompt = new PromptTemplate({
    inputVariables: ["templateQuery"],
  const formattedQuery = await queryPrompt.format({ templateQuery: query });

  const pineconeStore = await PineconeStore.fromExistingIndex(
    new OpenAIEmbeddings(),
    { pineconeIndex }
  const docResults = await pineconeStore.similaritySearch(query, 2);
  console.log({ docResults });

  const llm = new OpenAI({
    openAIApiKey: OPENAI_KEY,
    temperature: 0,

  const chain = loadQAChain(llm, {
    type: "stuff",
  const llmResult = await{
    input_documents: docResults,
    question: formattedQuery,

  console.log({ llmResult });

This is the error I’m getting:

    data: {
      error: {
        message: "This model's maximum context length is 4097 tokens, however, you requested 5395 tokens (5139 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.",
        type: 'invalid_request_error',
        param: null,
        code: null
1 Like

@lachie1, Thanks, I will check the thread out.

1 Like

As the error message says, you have too many tokens in the prompt, try to reduce your prompt, or switch to a 16k model: gpt-3.5-turbo-16k


@kjordan, using gpt-3.5-turbo-16k worked for me. Thanks.

@lachie1, Thanks for your response. I really appreciate it!

1 Like