GPT 4 Vision With Blank Page Creates Weird Results

vikramsnarayan · June 23, 2024, 11:10am

For me, the following code produces nonsensical results because of the blank page image.
Am I doing something wrong, or is this just expected token completion behavior / hallucination or is there something embedded in the seemingly blank image?

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Please transcribe the image accurately into markdown text with appropriate headings. Also, provide alternative text for images if any."},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://i.imgur.com/3Ohm46U.png",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

jr.2509 · June 23, 2024, 11:25am

Hi!

Yes, this is normal or expected behaviour when supplying a blank image or an image that has poor resolution and where its contents are not detectable.

You can build in a control into your prompt by including an additional instruction along the lines of the following (further tailor as required): If the image is blank, i.e. contains no visual content, please return blank image as your response.

This will prevent the model from returning a hallucination as response.

sps · June 23, 2024, 11:40am

Welcome @vikramsnarayan

You can ask the model to transcribe only if the image has any text.

See example below:

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "If the image has text, please transcribe the image accurately into markdown text with appropriate headings, else reply with 'no text detected'. Also, provide alternative text for images if any."},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://i.imgur.com/3Ohm46U.png",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

vikramsnarayan · June 23, 2024, 2:41pm

thanks. That’s a feasible solution.

Topic		Replies	Views
Strange Responses from Vision API Bugs gpt-4	4	317	May 21, 2024
Vision is creating completely made-up answers Bugs gpt-4-vision	6	724	March 3, 2024
Does "gpt-4-vision" model have image generation capability! API gpt-4-vision	1	611	May 17, 2024
GPT unable to view the content of image API image-reading , gpt-4o	0	155	November 1, 2024
GPT-4 omni text recognition via API works worse than on chatgpt.com API gpt-4 , api	4	1218	August 13, 2024

GPT 4 Vision With Blank Page Creates Weird Results

Related topics