GPT 4 Vision Model misrepresentation of text from an Invoice (OCR Task)

Hi, I’m using the GPT 4 Vision-Preview model for the Text extraction (OCR) from the image, 90% of it works but Some of the text could not be recognised and extracted.

Say for example in an invoice there is a Zip code like WN1236D, it recognises WN1326D.

Also, it could not differentiate between S and 5 in the invoice.

Is there any way to fine-tune the GPT 4 Vision Model?

Any Suggestions, Kindly let me know. Thanks in Advance

2 Likes

No. Your only option is trying to manipulate the image itself so it’s easier to read.

There are multiple Gen AI OCR models that offer fine-tuning. Google Document AI is the first that springs to mind.

These models also typically including bounding boxes plus confidence. So in the case that an “S” gets mistaken for a “5” you can programmatically send a new document with JUST the bounding box and preferably more zoomed in/clarified.

1 Like

Hey @krroopeshbharatwaj1 any update on the same?
Were you able to find any metho to mitigate this issue?
I’m also working on an extraction problem where while doing the extraction of IDs, GPT4V seems to be jumbling few characters, even misses few repeated characters

Let me know if you’ve any solution for the same.

1 Like

Hi @pathikghugare

Sorry for the late response, I could not improve it, Later I switched to Gemini 1.5 Pro- surprisingly better than the Vision Model.