GPT 4 Vision Model misrepresentation of text from an Invoice (OCR Task)

krroopeshbharatwaj1 · May 2, 2024, 5:38pm

Hi, I’m using the GPT 4 Vision-Preview model for the Text extraction (OCR) from the image, 90% of it works but Some of the text could not be recognised and extracted.

Say for example in an invoice there is a Zip code like WN1236D, it recognises WN1326D.

Also, it could not differentiate between S and 5 in the invoice.

Is there any way to fine-tune the GPT 4 Vision Model?

Any Suggestions, Kindly let me know. Thanks in Advance

RonaldGRuckus · May 2, 2024, 5:40pm

No. Your only option is trying to manipulate the image itself so it’s easier to read.

There are multiple Gen AI OCR models that offer fine-tuning. Google Document AI is the first that springs to mind.

These models also typically including bounding boxes plus confidence. So in the case that an “S” gets mistaken for a “5” you can programmatically send a new document with JUST the bounding box and preferably more zoomed in/clarified.

pathikghugare · May 8, 2024, 11:06am

Hey @krroopeshbharatwaj1 any update on the same?
Were you able to find any metho to mitigate this issue?
I’m also working on an extraction problem where while doing the extraction of IDs, GPT4V seems to be jumbling few characters, even misses few repeated characters

Let me know if you’ve any solution for the same.

krroopeshbharatwaj1 · June 24, 2024, 1:30pm

Hi @pathikghugare

Sorry for the late response, I could not improve it, Later I switched to Gemini 1.5 Pro- surprisingly better than the Vision Model.

pathikghugare · July 31, 2024, 11:05am

What I did is along with image, inserted the OCR text for the image and then it worked pretty well but still need to make an API call for OCR services

Topic		Replies	Views
Can I Finetune GPT 4.0 2024-08-06 with Images? API	1	78	August 21, 2024
Vision API flips numbers on extracting text from image Bugs	3	1031	December 13, 2023
Did anyone try new gpt4 o model for text extraction from an image? API gpt-4 , chatgpt	2	1614	June 10, 2024
How to improve the accurate for gpt-4-vision in detail message? API gpt-4 , api , gpt-4-vision	0	43	October 18, 2024
Image recognition of brands clothes and mistakes API gpt-4-vision	2	126	August 4, 2024

GPT 4 Vision Model misrepresentation of text from an Invoice (OCR Task)

Related topics