GPT-4 omni text recognition via API works worse than on chatgpt.com

Hello all!

I’m investigation poor text recognition via API with GPT-4 Omni.
The original OpenAI chat on chatgpt_com is working like a charm, text is 100% equal to PNG, no fictional words or sentences.
If I us API call to GPT-4o for a one page text I always get only the first paragraph almost correct, the others are fictional.

I tried custom prompts to stop using Tesseract and use internal vision capabilities. But no luck. What should I do?

Hey

This can be a bit counter intuitive, but you actually gave it the image as an attachment and not as an image.

There is two different things you can do with images and GPT.
Give it the file, which it can then use when coding (what you did)
or you give it the image specifically for vision.

You are doing the wrong one.

This is what it looks like on playground: (use the right button)
Screenshot 2024-05-31 at 12.04.41

In the api it looks like this:

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

If you still need help, let me know and please explain your set-up further. :smiley:

Hi!
Yes I used image attachment.

1 Like

you ever figure out a fix for this?