Your input exceeds the context window of this model

Hi
I want to extract the details from PDF to JSON format. So I first read the PDF content in Base64-encoded,

// const data = fs.readFileSync("invoice-1.pdf");   // ~352KB
const data = fs.readFileSync("invoice-2.pdf");   // ~12MB
const s = data.toString("base64");

then I defined this instructions

You are a data parser. Parse the following details from the document accurately and return it in JSON format.

- Document date

in this Response API:

const response = await client.responses.create({
  model: "gpt-4.1-2025-04-14",
  instructions,
  input: [
    {
      role: "user",
      content: [
        {
          type: "input_file",
          filename: "statement.pdf",
          file_data: `data:application/pdf;base64,${s}`,
        },
      ],
    },
  ],
});

the program worked fine with invoice-1.pdf but failing with invoice-2.pdf (~12MB)

BadRequestError: 400 Your input exceeds the context window of this model. Please adjust your input and try again.
    at APIError.generate (file:///home/user/temp/openai/node_modules/.pnpm/openai@5.9.0_zod@3.25.76/node_modules/openai/core/error.mjs:41:20)
    at OpenAI.makeStatusError (file:///home/user/temp/openai/node_modules/.pnpm/openai@5.9.0_zod@3.25.76/node_modules/openai/client.mjs:156:32)
    at OpenAI.makeRequest (file:///home/user/temp/openai/node_modules/.pnpm/openai@5.9.0_zod@3.25.76/node_modules/openai/client.mjs:301:30)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async file:///home/user/temp/openai/server.js:54:18 {
  status: 400,
  headers: Headers {},
  requestID: 'req_7121698917b295d2e9b3e565fcd278be',
  error: {
    message: 'Your input exceeds the context window of this model. Please adjust your input and try again.',
    type: 'invalid_request_error',
    param: 'input',
    code: 'context_length_exceeded'
  },
  code: 'context_length_exceeded',
  param: 'input',
  type: 'invalid_request_error'
}

How can I fix and avoid this limitation?

PDF limits: max 100 pages, with size limits 32MB total content.

It sounds like the document text extraction went crazy on you, since the model can accept a million tokens.

As an alternate diagnostic step, upload to the file storage API, and then include in the user message by using the file ID instead. That may remove any concerns about improper encoding that we don’t see.

Then, the PDF may be broken, password-protected, have different search text than seen. I would use a Python PDF library to extract the document text yourself, replicating what the API might do, and see if you get useful text from it.