Document Details Extraction

dmisi98 · May 27, 2024, 4:49pm

Hi,
I’m looking for the best available solution for IMG/PDF invoice/receipts data extraction.
So far the I’ve tried these methodes:

PDF > extract all text > OpenAI LLM, response format JSON
IMG > Pass Img URL > OpenAI LLM, response format JSON
Google’s Document AI > Very teribble results but can be trained.
Google Studio Gemini > Similar result as OpenAI but much cheaper

It should handle different invoice formats, so can’t predefine the format/fields.
The goal to store all Invoices, its items to db so our own model can further analyse the spendings.

I’m new and probably don’t know all possible solutions so any suggestion more than welcome

zafarr · October 13, 2024, 3:59pm

Azure document intelligence (1 cent per page) seems to be a great option. I haven’t tried it myself yet, but saw a video in which it was being used for extracting info from complex pdfs.

Topic		Replies	Views
I wanted to extract information from invoice using GPT-4o, which can be image or PDF API gpt4o	4	1146	September 18, 2024
How to Process PDF Files with OpenAI's Tools and APIs for Invoice Automation? API api , gpt-4-vision , ocr	1	1002	January 15, 2025
How to Extract Data from Images Using OpenAI API? API gpt-4	1	2322	October 18, 2024
Finance Agent -> reading Pdf scans (failing) API	0	67	March 12, 2025
Best approach for extracting data from diverse invoice PDFs using OpenAI - Seeking guidance on model selection and training strategy API	6	2135	November 4, 2024

Document Details Extraction

Related topics