Inconsistencies in Image Analysis with GPT-4o-mini Using Low Detail

elianhermitte · September 10, 2024, 4:50pm

Hello OpenAI community,

I’m currently working on a project using the GPT-4o-mini API to analyze images. I’ve noticed inconsistencies in the model’s ability to analyze certain images, and I’m hoping to get some clarification and advice from the community.

Context

Model used: GPT-4o-mini
Task: Image analysis from URLs
Method: Using the ChatCompletion API with the image_url parameter
Detail level: Low (as specified in the API call)

Observed Problem

Some images are successfully analyzed, while others generate a response indicating that the model is not able to analyze images.

Here is an example : “I can’t view or analyze images directly, but if you provide details or text from the image, I can help you understand or summarize that information.”

What’s intriguing is that:

The same images that fail to be analyzed once can be successfully analyzed at other times.
The same code works perfectly with a superior model (GPT-4o), without any analysis issues, also using low detail.

Example API Call

response = openai.ChatCompletion.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What are the info on this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": image_url,
                        "detail": "low"
                    }
                }
            ]
        }
    ],
    max_tokens=500,
)

Questions

Are there specific limitations for GPT-4o-mini in terms of image size, format, or complexity when using low detail that I should be aware of?
Are there any best practices for optimizing image analysis with this particular model and detail level?
Are there any known issues or limitations with GPT-4o-mini regarding image analysis at low detail that could explain this inconsistent behavior?

Any information or advice would be greatly appreciated. If additional details are needed, I’d be happy to provide them.

Thank you in advance for your help!

_j · September 10, 2024, 5:20pm

The “o” model has multimodal capabilities that have not been released.

It also has training on denying those abilities, like it’s not going to output tokens of images or speech, and that refusal spills over.

So it likes to deny, and is of low understanding. You need some system prompt that lets the AI know that it has built-in computer vision ability, its image examination skill is enabled, etc, to defeat refusals and denials. Plus, it has less factual preservation of traning data, being small.

Low images are encoded to just a few tokens without repeating tiling after, so it just may be not enough to grab the mini AI’s attention, or resized to 512 pixels maximum at detal:low, there may little meaning to be had on some images.

When you ask, try not “what are the info on this image”, but “From the attached image, using your own computer vision skill, extract all the information available: the text or a description of contents”, or what you expect.

Then in the system prompt of a specialist, how about “You are Look-o, an AI with image analysis capabilities built-in and enabled.”

If gpt-4o-mini was satisfactory all the time, there would be no reason to upgrade to a more expensive higher-quality model (like gpt-3.5-turbo).

Topic		Replies	Views
Question about GPT-4 Vision API and Limits in Image Analysis API api	4	141	January 27, 2025
GTP-4o no longer analyses images API gpt-4 , image-reading	3	449	March 2, 2025
4o & turbo models can't read images anymore API	5	2154	June 4, 2024
Parse image to text with gpt-4o with ChatGpt UI and OpenAI chat.completions.create endpoint - Very Different Results API gpt-4 , chatgpt , api	3	1527	August 3, 2024
The performance difference between ChatGPT4o and gpt4o api using the same prompt for image analysis API gpt-4 , chatgpt , gpt-4-vision , gpt4-vision , api-vision	5	921	July 27, 2024

Inconsistencies in Image Analysis with GPT-4o-mini Using Low Detail

Context

Observed Problem

Example API Call

Questions

Related topics