GPT-4.1 extracts table details from PDF

Hi,
I’m trying to extract details from a PDF to JSON.

I defined the expected output JSON format using Zod

const TransactionSchema = z.object({
  date: z.string(),
  reference: z.string(),
  description: z.string(),
  debit: z.number().nullable(),
  credit: z.number().nullable(),
  balance: z.number(),
});

const StatementSchema = z.object({
  company_name: z.string(),
  registration_number: z.string(),
  customer_name: z.string(),
  account_code: z.string(),
  date: z.string(),
  transactions: z.array(TransactionSchema),
});

and my system prompt looks like this,

const instructions = "You are an OCR-powered document parser.";

then I send request to GPT-4.1 using Responses API,

const data = await readFile("statement.pdf", { encoding: "base64" });

const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-4.1-2025-04-14",
  instructions,
  input: [
    {
      role: "user",
      content: [
        {
          type: "input_file",
          filename: "statement.pdf",
          file_data: `data:application/pdf;base64,${data}`,
        },
      ],
    },
  ],
  temperature: 0,
  text: {
    format: zodTextFormat(StatementSchema, "Statement"),
  },
});

Note that the PDF contains a transaction table and some rows look like the following (with wrapping lines):

The image shows a bank transaction statement with two payment entries for an account, dated 28/3/2023 and 30/3/2023, reflecting credits of 35,843.00 and 1,485.00, and resulting in negative balances of (25,435.94) and (26,920.94) respectively. (Captioned by AI)

and the GPT-4.1 parsed the details incorrectly:

  • "OR-05020" instead of "OR-05020 GIRO"
  • "Payment For Account GIRO" instead of "Payment For Account"
  • debit amount is 0 instead of blank
{
  transactions: [
    {
      "date": "28/3/2023",
      "reference": "OR-05020", // correct: "OR-05020 GIRO"
      "description": "Payment For Account GIRO", // correct: "Payment For Account"
      "debit": 0, // correct: null
      "credit": 35843.0,
      "balance": -25435.94
    },
    {
      "date": "30/3/2023",
      "reference": "OR-05015", // correct: "OR-05015 GIRO"
      "description": "Payment For Account GIRO", // correct "Payment For Account"
      "debit": 0, // correct: null
      "credit": 1485.0,
      "balance": -26920.94
    },
  ]
}

and also this transaction,

table-rows-2

is also parsed incorrectly (credit and balance amount are swapped):

{
  "date": "11/7/2023",
  "reference": "OR-05596", // correct: "OR-05596 GIRO"
  "description": "Payment For SB217788, SB218071, SB218362 GIRO", // correct: "Payment For SB217788, SB218071, SB218362"
  "debit": 0,
  "credit": 6651.0, // correct: 7436.0
  "balance": 7436.0 // correct: 6651.0
},

How can I tell GPT-4.1 to parse the values correctly?

1 Like

If you know what is in there, you can add to the prompt more guidance:

Extract entities for a single dated line item:
  "date": is a single line, the date as DD/MM/YYYY,
  "reference": is two lines that shall be combined to one
  "description": is a single or multiple line description
  "debit": is a currency value with two decimal places (could be blank)
  "credit": is a currency value with two decimal places
  "balance": is a currency value with two decimal places

Then, remember that the input image will be resized so the shorter dimension is maximum 768 pixels. Thus, an A4 page scan will become 1086 x 768 (6 tiles billed).

Ensure the text is still clear at that size.

1 Like

Hi, _j

In this case, I’m submitting the PDF as base64 encoded string (not an image). I fallback to GPT-4.1 after seeing that GPT-4o couldn’t recognize the texts in header correctly. While GPT-4.1 parsed the header correctly but it has the problem is interpreting the transactions now …

The reference and description can be anything entered by the users. They can be in one, two or more lines.

If you are using the input_file parameter, then the API will provide that PDF both as per-page images and text. In that case, you have to accept the quality of vision that they provide, and can’t observe the resolution they are using except by the not-so-transparent cost of “tiles” + any text that might be available programmatically.

Since that internal extraction by OpenAI is done by libraries for handling PDFs, you also could use Python, and see what kind of text is being obtained by the text-extraction pass. It likely is not “tabular”, and you could do your own page render to images alone as vision input, and see if that performs better with your own code rather than a generic solution.

It looks the following instruction works for me (with gpt-4.1-2025-04-14)

You are an expert at document data extraction. You will be given a **statement** in PDF and convert it into the given structure.

# Instructions

## Transaction

Below is a sample of how transactions typically appear in the document:

| Date       | Reference           | Transaction Description        | Debit     | Credit   | Balance   |
| ---------- | ------------------- | ------------------------------ | --------- | -------- | --------- |
| 28/3/2023  | OR-05020 GIRO       | Payment For SB217788           |           | 7,436.00 | 6,651.00  |
| 25/4/2023  | SB217788            | Sales                          | 86,979.00 |          | 65,839.06 |
| 21/8/2023  | OR-05988 GIRO       | Partial payment For SB218535   |           | 4,134.00 | 10,800.00 |
| 9/1/2023   | OR-04625 GIRO       | Payment For SB233248           |           | 265.00   | 0.00      |
| 12/12/2023 | OR-06456 HLL 000007 | Payment For SB214480, SB211749 |           | 1749.20  | 0.00      |

You must always keep the entire content of the `Reference`. column as-is - do not drop or separate codes like "GIRO", "HLL", or others. They are part of the reference string and must be preserved exactly.

This is tested with four PDF files. I have to add similar transaction rows in the sample table so that the model is able to parse the correct values. I’m afraid that we have to keep adding the sample transaction rows when we are processing huge number of PDFs.

1 Like