The performance difference between ChatGPT4o and gpt4o api using the same prompt for image analysis

When I use the same prompt shown as follows to analyze images using chatgpt4o and gpt4o api. Why I get very different results: very good result from chatgot4o while totally wrong answers from gpt4o api? wondering why

You are an AI image processing expert. Your task is to accurately determine which line segment {A, B, C} has the same length as {S} in each image. Follow the detailed steps below to ensure accurate measurement and comparison:

        Load Image: Read each image from the current folder. Each image is named from '1.PNG' to '18.PNG'.
        Format conversion: Convert .PNG images into numpy array for further analysis.
        Access Pixels: Use the pixel values to measure the lengths of the vertical line segments.
        Measure Line Lengths:
        For each column in the image, count the number of consecutive white pixels (pixel value 255) to determine the length of each vertical line.
        Each image contains four white vertical line segments representing {S, A, B, C}, arranged from left to right respectively.
        The width of each line segment is one pixel.
        Compare Lengths: Measure the pixel length of each separated line segment for {S, A, B, C} and compare which line segment from {A, B, C} has the same length as {S}.
        Provide Answer: Based on the measurements, provide a single answer (A, B, or C) for each image.
        Your answer needs to be based on precise pixel length measurements. Ensure that the following process is used:
        
        Determine which line segment (A, B, or C) has the same length as S and provide the answer.
        Final Answer Format:
        For each image, provide the answer in the format: Image [number]: [A/B/C]
1 Like

Tagging, @Foxalabs …

It’s been noted that 4o is not as “good” as GPT-4-Vision, but the teams are working on it, I’m sure.

We’re working on getting more clarification…

3 Likes

Hi,

Physical relationships are not one of it’s strong suits currently, try the GPT-4-VISION-PREVIEW model, exactly the same API call, just change the model name. See how that performs.

3 Likes

I have two things I’ll add which might help,

  1. If this is indicative if the actual problem you need to solve, then there are better tools for this job. Writing a Portion script which identifies adjacent pixels of the same color and determining the number of pixels in that line segment is fairly trivial.
  2. If this is a type of minimally working example of a problem you are trying to solve, adding an overlay of horizontal lines to the image will help tremendously in your efforts to “ground” the visual model in reality. (This will also help if this is your actual problem.)

I hope that helps!

Didn’t fight the models, learn their limitations and meet them where they are!

3 Likes

thank you for your advice. gpt4 vision preview does not work well either.

Sorry for possibly hijacking the thread.

But I think I’m seeing the same thing.
via ChatGPT interface and using the gpt-4o model it correctly answer a question about an image. It’s like a yes/no type of question. Answer yes 10/10 times.

Using the following:

payload = {
        "model": model,
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": prompt
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{base64_image}"
                        }
                    }
                ]
            }
        ],
        "max_tokens": 500
    }

Tried increasing the tokens but looking in playground it’s about 233 tokens consumed to answer the question correctly. Could it be how I’m uploading/encoding the image??