Document Details Extraction

Hi,
I’m looking for the best available solution for IMG/PDF invoice/receipts data extraction.
So far the I’ve tried these methodes:

  • PDF > extract all text > OpenAI LLM, response format JSON

  • IMG > Pass Img URL > OpenAI LLM, response format JSON

  • Google’s Document AI > Very teribble results but can be trained.

  • Google Studio Gemini > Similar result as OpenAI but much cheaper

It should handle different invoice formats, so can’t predefine the format/fields.
The goal to store all Invoices, its items to db so our own model can further analyse the spendings.

I’m new and probably don’t know all possible solutions so any suggestion more than welcome :slight_smile:

1 Like

Azure document intelligence (1 cent per page) seems to be a great option. I haven’t tried it myself yet, but saw a video in which it was being used for extracting info from complex pdfs.