The performance difference between ChatGPT4o and gpt4o api using the same prompt for image analysis

HuixinZhong · June 15, 2024, 3:11pm

When I use the same prompt shown as follows to analyze images using chatgpt4o and gpt4o api. Why I get very different results: very good result from chatgot4o while totally wrong answers from gpt4o api? wondering why

You are an AI image processing expert. Your task is to accurately determine which line segment {A, B, C} has the same length as {S} in each image. Follow the detailed steps below to ensure accurate measurement and comparison:

        Load Image: Read each image from the current folder. Each image is named from '1.PNG' to '18.PNG'.
        Format conversion: Convert .PNG images into numpy array for further analysis.
        Access Pixels: Use the pixel values to measure the lengths of the vertical line segments.
        Measure Line Lengths:
        For each column in the image, count the number of consecutive white pixels (pixel value 255) to determine the length of each vertical line.
        Each image contains four white vertical line segments representing {S, A, B, C}, arranged from left to right respectively.
        The width of each line segment is one pixel.
        Compare Lengths: Measure the pixel length of each separated line segment for {S, A, B, C} and compare which line segment from {A, B, C} has the same length as {S}.
        Provide Answer: Based on the measurements, provide a single answer (A, B, or C) for each image.
        Your answer needs to be based on precise pixel length measurements. Ensure that the following process is used:
        
        Determine which line segment (A, B, or C) has the same length as S and provide the answer.
        Final Answer Format:
        For each image, provide the answer in the format: Image [number]: [A/B/C]

PaulBellow · June 15, 2024, 6:32pm

Tagging, @Foxalabs …

It’s been noted that 4o is not as “good” as GPT-4-Vision, but the teams are working on it, I’m sure.

We’re working on getting more clarification…

Foxalabs · June 15, 2024, 6:35pm

Hi,

Physical relationships are not one of it’s strong suits currently, try the GPT-4-VISION-PREVIEW model, exactly the same API call, just change the model name. See how that performs.

anon22939549 · June 15, 2024, 6:55pm

I have two things I’ll add which might help,

If this is indicative if the actual problem you need to solve, then there are better tools for this job. Writing a Portion script which identifies adjacent pixels of the same color and determining the number of pixels in that line segment is fairly trivial.
If this is a type of minimally working example of a problem you are trying to solve, adding an overlay of horizontal lines to the image will help tremendously in your efforts to “ground” the visual model in reality. (This will also help if this is your actual problem.)

I hope that helps!

Didn’t fight the models, learn their limitations and meet them where they are!

HuixinZhong · June 17, 2024, 6:35am

thank you for your advice. gpt4 vision preview does not work well either.

frippe75 · July 27, 2024, 6:50am

Sorry for possibly hijacking the thread.

But I think I’m seeing the same thing.
via ChatGPT interface and using the gpt-4o model it correctly answer a question about an image. It’s like a yes/no type of question. Answer yes 10/10 times.

Using the following:

payload = {
        "model": model,
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": prompt
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{base64_image}"
                        }
                    }
                ]
            }
        ],
        "max_tokens": 500
    }

Tried increasing the tokens but looking in playground it’s about 233 tokens consumed to answer the question correctly. Could it be how I’m uploading/encoding the image??

Topic		Replies	Views
Comparison of Output Completeness Between ChatGPT and Our AI Model API gpt-4	1	80	December 11, 2024
Parse image to text with gpt-4o with ChatGpt UI and OpenAI chat.completions.create endpoint - Very Different Results API gpt-4 , chatgpt , api	3	1421	August 3, 2024
Is GPT4-o dumber in Assistans API than in normal chat? API gpt-4o	3	760	September 7, 2024
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3517	December 6, 2023
Image comparison - Working with Chat BUT Failing with API API gpt-4 , api	0	58	October 29, 2024

The performance difference between ChatGPT4o and gpt4o api using the same prompt for image analysis

Related topics