Hi,
I’m trying to extract details from a PDF to JSON.
I defined the expected output JSON format using Zod
const TransactionSchema = z.object({
date: z.string(),
reference: z.string(),
description: z.string(),
debit: z.number().nullable(),
credit: z.number().nullable(),
balance: z.number(),
});
const StatementSchema = z.object({
company_name: z.string(),
registration_number: z.string(),
customer_name: z.string(),
account_code: z.string(),
date: z.string(),
transactions: z.array(TransactionSchema),
});
and my system prompt looks like this,
const instructions = "You are an OCR-powered document parser.";
then I send request to GPT-4.1 using Responses API,
const data = await readFile("statement.pdf", { encoding: "base64" });
const client = new OpenAI();
const response = await client.responses.create({
model: "gpt-4.1-2025-04-14",
instructions,
input: [
{
role: "user",
content: [
{
type: "input_file",
filename: "statement.pdf",
file_data: `data:application/pdf;base64,${data}`,
},
],
},
],
temperature: 0,
text: {
format: zodTextFormat(StatementSchema, "Statement"),
},
});
Note that the PDF contains a transaction table and some rows look like the following (with wrapping lines):
and the GPT-4.1 parsed the details incorrectly:
"OR-05020"
instead of"OR-05020 GIRO"
"Payment For Account GIRO"
instead of"Payment For Account"
- debit amount is
0
instead of blank
{
transactions: [
{
"date": "28/3/2023",
"reference": "OR-05020", // correct: "OR-05020 GIRO"
"description": "Payment For Account GIRO", // correct: "Payment For Account"
"debit": 0, // correct: null
"credit": 35843.0,
"balance": -25435.94
},
{
"date": "30/3/2023",
"reference": "OR-05015", // correct: "OR-05015 GIRO"
"description": "Payment For Account GIRO", // correct "Payment For Account"
"debit": 0, // correct: null
"credit": 1485.0,
"balance": -26920.94
},
]
}
and also this transaction,
is also parsed incorrectly (credit and balance amount are swapped):
{
"date": "11/7/2023",
"reference": "OR-05596", // correct: "OR-05596 GIRO"
"description": "Payment For SB217788, SB218071, SB218362 GIRO", // correct: "Payment For SB217788, SB218071, SB218362"
"debit": 0,
"credit": 6651.0, // correct: 7436.0
"balance": 7436.0 // correct: 6651.0
},
How can I tell GPT-4.1 to parse the values correctly?