Hi, I’m using the GPT 4 Vision-Preview model for the Text extraction (OCR) from the image, 90% of it works but Some of the text could not be recognised and extracted.
Say for example in an invoice there is a Zip code like WN1236D, it recognises WN1326D.
Also, it could not differentiate between S and 5 in the invoice.
Is there any way to fine-tune the GPT 4 Vision Model?
Any Suggestions, Kindly let me know. Thanks in Advance
2 Likes
No. Your only option is trying to manipulate the image itself so it’s easier to read.
There are multiple Gen AI OCR models that offer fine-tuning. Google Document AI is the first that springs to mind.
These models also typically including bounding boxes plus confidence. So in the case that an “S” gets mistaken for a “5” you can programmatically send a new document with JUST the bounding box and preferably more zoomed in/clarified.
1 Like
Hey @krroopeshbharatwaj1 any update on the same?
Were you able to find any metho to mitigate this issue?
I’m also working on an extraction problem where while doing the extraction of IDs, GPT4V seems to be jumbling few characters, even misses few repeated characters
Let me know if you’ve any solution for the same.
1 Like
Hi @pathikghugare
Sorry for the late response, I could not improve it, Later I switched to Gemini 1.5 Pro- surprisingly better than the Vision Model.
1 Like
What I did is along with image, inserted the OCR text for the image and then it worked pretty well but still need to make an API call for OCR services