Accuracy issue in extracting financial data from table which is in pdf or image

amandeepbareth · September 29, 2024, 6:14am

I’m using gpt-4o mini to extract financial data from table which is in pdf or image … but I am having issues related to accuracy … example -
For same pdf and same prompt … sometimes gives correct data after parsing sometimes wrong …how can I solve that issue and what would be the better approach to do that …simply I want json data after parsing table pdf or image

anon10827405 · September 29, 2024, 6:13pm

If you’re using ChatGPT you really don’t have much options.

You would need to move towards a more in-house solutions using a pipeline of OCR & potentially other AI models (like gpt-4o-mini) in the API.

But, you will always run into small character issues when performing OCR (In any benchmarks try long strings of numbers like 688987899). It may be worth segmenting the document first and running each segment in parallel then synthesizing the results. Being able to scale the segments helps a lot with tricky numbers.

You should also consider the fact that you have a PDF. If you can highlight the text & copy it you sometimes can also extract the text from the file itself and use it as a comparison.

Using vision models in the API do not permit PDFs and require that their benefits are destroyed before sending them over.

Lastly, you can organize rules for your financial documents. For example

sum(ROW A,B) == ROW G

Then you can set up these rules to validate your data.

Topic		Replies	Views
Text formatting issue due to which ai giving inaccurate responses API gpt-35-turbo , chatgpt , api	6	772	April 24, 2024
Correct retrieval of figures from uploaded files GPT builders	2	555	January 14, 2024
Data points in tables and charts in images Prompting gpt-4	7	1677	April 17, 2025
Finance Agent -> reading Pdf scans (failing) API	0	44	March 12, 2025
GPT 4 Vision Model misrepresentation of text from an Invoice (OCR Task) API gpt-4	4	1299	July 31, 2024

Accuracy issue in extracting financial data from table which is in pdf or image

Related topics