GPT 4 Vision With Blank Page Creates Weird Results

For me, the following code produces nonsensical results because of the blank page image.
Am I doing something wrong, or is this just expected token completion behavior / hallucination or is there something embedded in the seemingly blank image?

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Please transcribe the image accurately into markdown text with appropriate headings. Also, provide alternative text for images if any."},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://i.imgur.com/3Ohm46U.png",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

Hi!

Yes, this is normal or expected behaviour when supplying a blank image or an image that has poor resolution and where its contents are not detectable.

You can build in a control into your prompt by including an additional instruction along the lines of the following (further tailor as required): If the image is blank, i.e. contains no visual content, please return blank image as your response.

This will prevent the model from returning a hallucination as response.

1 Like

Welcome @vikramsnarayan

You can ask the model to transcribe only if the image has any text.

See example below:

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "If the image has text, please transcribe the image accurately into markdown text with appropriate headings, else reply with 'no text detected'. Also, provide alternative text for images if any."},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://i.imgur.com/3Ohm46U.png",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])
2 Likes

thanks. That’s a feasible solution.

2 Likes