Error reading pdf file with api (files && completions)

dcriador · June 13, 2025, 7:34am

Hello,

I’m using OpenAI’s API (and ChatGPT web) to extract data from PDF invoices, requesting only a JSON output with specific keys, such as invoice number, CIF, total, dates, VAT, etc.

Problem:
When I upload a real PDF invoice and ask ChatGPT to extract the data as JSON, the response contains data that has absolutely nothing to do with the actual document. For example, it returns an invented invoice number, CIF, amounts, and supplier. This happens even when I upload the same file multiple times.

Technical details:

Model used: gpt-4.1
I set the prompt as follows:

text

CopiarEditar

You are an invoice data extractor. You will receive a PDF invoice as input. Return ONLY a JSON with these keys:
  - numero (the invoice number)
  - cifs (CIFs found, separated by commas)
  - nombre
  - proveedor_nombre
  - total
  - divisa
  - importe (equal to total)
  - ivas: array of objects { base, cuota, tipo }
  - fecha (YYYY-MM-DD)
  - fecha_vencimiento (YYYY-MM-DD or empty string)
  - irpf (withholding tax, if any, or 0/null)
Nothing else, just clean JSON.

Comment:
These values do not match the real data in the PDF at all (not the supplier, not the amounts, nothing). I have tested this with several invoices and always get unrelated/fake data, even when uploading the same file multiple times.

Questions:

Has anyone else experienced this issue?
Is this a known limitation or bug?
Could OpenAI staff look into this, or is there a workaround to get the real data from PDFs?

Thanks in advance for your help!

_j · June 13, 2025, 10:01am

The PDF file attachment feature on Chat Completions is simply unreliable and unusable, and this has continued without improvement.

attach via base64 - only the last file will be read
attach via file id - 50% of trials, nothing from the PDF is included

I would recommend that you apply your own PDF text extraction and image render technology, because OpenAI continues to supply failure that they will not address.

Topic		Replies	Views
Files API issue when sending pdf files to extract out info Bugs	2	193	June 4, 2025
Problema al subir PDFs escaneados a la API de OpenAI Bugs assistants-files	0	30	July 3, 2025
Assistant API system files should not be exposed to the user + PDF file parsing is intermittently buggy Feedback api	6	575	March 25, 2024
Unstable performance with PDF files in Responses API / Docs are also unclear on capabilities Bugs api , pdf	2	417	June 5, 2025
Assistant api retriever sometimes cannot read pdf API gpt-4 , api	5	2008	November 29, 2023

Error reading pdf file with api (files && completions)

Related topics