I’m using gpt-4o mini to extract financial data from table which is in pdf or image … but I am having issues related to accuracy … example -
For same pdf and same prompt … sometimes gives correct data after parsing sometimes wrong …how can I solve that issue and what would be the better approach to do that …simply I want json data after parsing table pdf or image
If you’re using ChatGPT you really don’t have much options.
You would need to move towards a more in-house solutions using a pipeline of OCR & potentially other AI models (like gpt-4o-mini) in the API.
But, you will always run into small character issues when performing OCR (In any benchmarks try long strings of numbers like 688987899
). It may be worth segmenting the document first and running each segment in parallel then synthesizing the results. Being able to scale the segments helps a lot with tricky numbers.
You should also consider the fact that you have a PDF. If you can highlight the text & copy it you sometimes can also extract the text from the file itself and use it as a comparison.
Using vision models in the API do not permit PDFs and require that their benefits are destroyed before sending them over.
Lastly, you can organize rules for your financial documents. For example
- sum(ROW A,B) == ROW G
Then you can set up these rules to validate your data.