Several days ago in the GPT builder, I was able to Instruct GPT to extract the text from uploaded photos, and it worked well even for somewhat blurry images or lower-contrast images.
The following day it would work perfectly some times, but most of the time fail – given the same images and same instructions. (If you waited an hour and tried again the results would change).
Today it fails every time even w/ high-contrast images, and based on research in this forum and in Reddit based on past OCR issues I’ve tried different approaches including specifying these things in instructions (one at a time of course):
"only use visual input and do not open the code environment.”
“use GPT-4V to convert the content in the photo to text”
“Use the opencv library to convert the content in the photo to text.”
Still no luck. It fails with messages including “It seems there is a persistent issue with extracting the text from the image. I will attempt a different approach to analyze and process the visual content.” “It appears that the optical character recognition process is not completing successfully within the allotted time frame, which is causing it to time out. Given the constraints of the current environment, I won’t be able to process the image using OCR any further.”
Analysis indicates that it keeps defaulting to using Tesseract for OCR but that apparently keeps failing.
(BTW since the issue appeared I have been testing this with high-quality, legible images - very high-contrast images of black text on white background, large font size so I don’t think this is an issue w/ input image quality)
Has anyone figured out a way around this other than relying on an external OCR system accessed via actions/API? Thanks