Does OpenAI prohibit OCR?

The documentation says input images may not contain text. Specifically, under Image Input Requirements → Other Requirements it says, “No Text”.

I just want to verify: does this means using OpenAI APIs for OCR

  • is prohibited?
  • is not supported?
  • is not possible?

We have used it successfully for OCR in the past. Also in the same document, there is mention of processing text in images, like that the model is not great at non-English and small text.

So what’s the story?

Thanks!

1 Like

Hi,

I do not think that OCR is prohibited. I might be wrong.
Personally speaking, I was able to get OCR with accuracy for my web app that I built.

I would like to listen to the other experts on this too, if there are any limitations.

Cheers.
Akitaishi

In my experience:
4o can read text in images jpg/webp/png/screenshots pasted directly in.
A few issues with PDF scanned docs but nothing worse than existing OCR.
I can’t comment on non-English as not tried that

OK this is what 4o says:

​In the OpenAI Images API documentation, the guideline “No text” refers to a recommendation to avoid including textual elements within images submitted for processing. This is because models like GPT-4’s vision capabilities are not optimized for interpreting text embedded in images, which can lead to inaccuracies or misinterpretations. By providing images without text, you ensure that the model focuses on visual content, leading to more accurate and reliable analysis.

I have a good example that backs that up…

That said… This section is ‘API Requirements’ - Input images must meet the following requirements to be used in the API.

You might find a better solution here:

Welcome to the dev forum @dandyfiner

Images with text aren’t prohibited AFAIK.

In fact, here’s how OpenAI enables PDF content inputs:

How it works

To help models understand PDF content, we put into the model’s context both the extracted text and an image of each page. The model can then use both the text and the images to generate a response. This is useful, for example, if diagrams contain key information that isn’t in the text.

3 Likes

I use GPT-4o for OCR via API in Algebraic Equation GPT4.