The documentation says input images may not contain text. Specifically, under Image Input Requirements → Other Requirements it says, “No Text”.
I just want to verify: does this means using OpenAI APIs for OCR
is prohibited?
is not supported?
is not possible?
We have used it successfully for OCR in the past. Also in the same document, there is mention of processing text in images, like that the model is not great at non-English and small text.
In my experience:
4o can read text in images jpg/webp/png/screenshots pasted directly in.
A few issues with PDF scanned docs but nothing worse than existing OCR.
I can’t comment on non-English as not tried that
In the OpenAI Images API documentation, the guideline “No text” refers to a recommendation to avoid including textual elements within images submitted for processing. This is because models like GPT-4’s vision capabilities are not optimized for interpreting text embedded in images, which can lead to inaccuracies or misinterpretations. By providing images without text, you ensure that the model focuses on visual content, leading to more accurate and reliable analysis.
I have a good example that backs that up…
That said… This section is ‘API Requirements’ - Input images must meet the following requirements to be used in the API.
To help models understand PDF content, we put into the model’s context both the extracted text and an image of each page. The model can then use both the text and the images to generate a response. This is useful, for example, if diagrams contain key information that isn’t in the text.