How to Process PDF Files with OpenAI's Tools and APIs for Invoice Automation?

Hello everyone,

We are exploring how to automate invoice processing using OpenAI’s tools and APIs. Our goal is to extract data, such as Line items, Quantity, Total Amount from PDF invoices.

However, we’ve noticed that the Chat Completions API and the OpenAI Chat Playground do not support direct PDF uploads. We are unsure of the best way to proceed and would like to hear suggestions from the community.

Some possible options we are aware of but haven’t fully explored include:

  1. Preprocessing the PDF using OCR tools to extract text and then passing it to the OpenAI API for analysis.
  2. Converting the PDF into text or structured formats (e.g., JSON) and integrating it into a pipeline with OpenAI’s API.

We’d love to know:

  1. How are others handling PDF files in their workflows with OpenAI APIs?
  2. Are there any best practices or tools you recommend for extracting and processing data from invoices?
  3. Does OpenAI have plans to support direct file processing in its APIs, or is there a workaround we might be missing?

We’re open to ideas and would appreciate any insights, workflows, or examples you can share.

Thank you!

1 Like

From experience you need to have a much more steerable solution. But, I imagine this won’t ring until you try it for yourself.

How are others handling PDF files in their workflows with OpenAI APIs?

Extract text first, then convert to image and process WITH the text as additional data

Are there any best practices or tools you recommend for extracting and processing data from invoices?

  • Don’t trust line items. Use programming to validate the numbers
  • Classify the invoice first by orientation, coloring, quality, & company, then have different models & instructions per classification.
  • Process the invoice with above information so that the text is very noticeable and easy to pick up. You can pull tricks like segmenting the invoice.
  • You will need a HITL (Human-In-The-Loop) part, simply as an “approval” checkpoint. Edge cases are inherent with AI. There is no such thing as 100% accuracy when it comes to handling noisy data.

Does OpenAI have plans to support direct file processing in its APIs, or is there a workaround we might be missing?

Probably. Other leading proprietary LLMs offer this solution in their API. But, no, PDFs are not supported and there haven’t been any definite answer besides that eventually they’d like to.


One thing to keep in mind that I recently had to consider: Over time people will take notice to companies automating invoice processing. I have no doubt that fraudulent invoices will be more common. For this reason: it’s absolutely necessary to have a HITL and a trace of where the invoice came from.

Simply put: You need a system, not a model.

4 Likes