From experience you need to have a much more steerable solution. But, I imagine this won’t ring until you try it for yourself.
How are others handling PDF files in their workflows with OpenAI APIs?
Extract text first, then convert to image and process WITH the text as additional data
Are there any best practices or tools you recommend for extracting and processing data from invoices?
- Don’t trust line items. Use programming to validate the numbers
- Classify the invoice first by orientation, coloring, quality, & company, then have different models & instructions per classification.
- Process the invoice with above information so that the text is very noticeable and easy to pick up. You can pull tricks like segmenting the invoice.
- You will need a HITL (Human-In-The-Loop) part, simply as an “approval” checkpoint. Edge cases are inherent with AI. There is no such thing as 100% accuracy when it comes to handling noisy data.
Does OpenAI have plans to support direct file processing in its APIs, or is there a workaround we might be missing?
Probably. Other leading proprietary LLMs offer this solution in their API. But, no, PDFs are not supported and there haven’t been any definite answer besides that eventually they’d like to.
One thing to keep in mind that I recently had to consider: Over time people will take notice to companies automating invoice processing. I have no doubt that fraudulent invoices will be more common. For this reason: it’s absolutely necessary to have a HITL and a trace of where the invoice came from.
Simply put: You need a system, not a model.