Finance Agent -> reading Pdf scans (failing)

Hi all,

I am developing an AI Agent for a finance company which handles bank statements.

The use case is as follows:

The client (finance company) receives bank statements from his clients, these bank statements are not on a single format/template, each client has their own, what they have in common however is that all are scanned pdfs. My client has an entire office branch that handles converting these pdfs to excel files and then export as CSV to continue on another application. I am trying to cut costs on this branch by helping with an agent.
I have tried different implementations also using some trained models from hugging face. The problem is that most of the time the templates are very bad and mixing that with the fact that the documents are scanned makes it really hard for models to detect bank statements (purchases, transfers, etc). I need to always extract 3 columns: Date, Description, Amount.

The only partial success I have had is OpenAI API with 4o, but even this is not consistent and often mixes up rows which causes the entire extraction to be invalid. Besides that the privacy issue for my clients clients also plays a factor when using OpenAI API.

With all of this being said, I have tried different vision models, extracting to text first then trying to map transactions and nothing has been successfully working, OpenAi not working for more than 2 tries at most in consecutive order (often the same test case failing on second try). I dont need 100% success rate on either the test cases or all results extracted, but a solid 70%+ would make for a good integration testing for a month or so.

If anyone can help me id greatly appreciate it.

What can I use to get around this?