OpenAI API to read commercial invoices

marieclaire.degroot · September 26, 2024, 9:52am

I put a scanned pdf in images so that gpt-4o can read it. I want it to read products from a table inside the picture, but I keep getting weird results, no matter what instructions I give. I dont want to extract the text since the point is that it can read images. Any tips on fixing this?

platypus · September 26, 2024, 12:26pm

Hi @marieclaire.degroot and welcome to the community!

So PDF files by default are treated as text-only - there is a PDF parser employed under the hood, which is OK for paragraphs, but for tables the structure looks wonky and lot of the context for various columns and rows is lost.

My recommendation in this case would be to do a conversion of PDF pages to images, using a library such as this, and then encoding the pages/images using base64 encoding and sending it to Vision API as detailed here.

When sending to Vision API you would have a system prompt that specifies how to extract and represent the data in tables. I would recommend to specify either a Markdown output or some other clean structured format (like Yaml).

For example, if you would like tables to be represented as Yaml you would have something like this in the prompt:

**Table Formatting Instructions**
Format tables in YAML as per the following structure:
* Represent tables as inline yaml code block with root node `table:`
* Include `description`, `column_names`, `row_names` and `data`
* Format each row as row_name:{{col1: value, col2: value, ...}}

From here you have couple of options. You can include in your prompt also exactly what you want to extract, so now the model will (once it has “in its minds eye” done this table representation) do the extraction. Alternatively, if this doesn’t perform super great, you can get Yaml table representation output and do a 2nd call to the ChatCompletions API to extract the information - you supply the Yaml in your user prompt. In the latter case, it may be possible to even use a smaller (mini) model.

Hope this helps!

marieclaire.degroot · September 27, 2024, 8:53am

I am making this in C#.

I already put the pdf in images using tesseract. If I use this image in chatGPT, I get a completely right result, with the products extracted from the table. however when I’m using the API or try the assistant in playground, the result doesn’t come anywhere near right, it makes up the prices and doesn’t read all the products

Topic		Replies	Views
How to Extract Data from Images Using OpenAI API? API gpt-4	1	2136	October 18, 2024
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3841	December 6, 2023
Best practice scanned PDF / What model to use? API chatgpt , plugin-development , api , gpt-4-vision	3	983	February 19, 2025
How to extract text from images using API? API gpt-4	2	773	January 31, 2025
Scanned pdf with API and ask questions API chatgpt , api	3	1415	October 15, 2024

OpenAI API to read commercial invoices

Related topics