GPT4-V: the order of multiple image inputs

fatpanda2 · November 22, 2023, 10:42am

GPT-V can process multiple image inputs, but can it differentiate the order of the images? Take the following messages as an example.

from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
  model="gpt-4-vision-preview",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What's in the first image? What's in the second image?",
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "imageA.jpg",
          },
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "imageB.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)
print(response.choices[0])

Will imageA be considered as the first image because it appears above imageB in the messages?

_j · November 22, 2023, 10:54am

The AI doesn’t have a reference really of which “came first”.

Others have gone as far as putting text into the image so it can be referred to.

Not tried, but something that could be an idea: Multiple images sent as multiple user messages when obtaining a reply. You could insert synthetic text “here’s my first image…” within the messages, and see if the AI is then able to answer based on position.

rsomani95 · January 17, 2024, 3:05pm

I’ve been facing this same problem. I thought maybe we could interleave image inputs with text but the API doesn’t seem to like that.

My content was setup as follows:

PROMPT_MESSAGES = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Here are a few images I have on hand. I'd like you to pick the most appropriate one for a Christmas greeting card I'm sending out on behalf of my family."
            },

            {
                "type": "text",
                "text": "This is image #1"
            },
            {
                "type": "image_url",
                "image_url": image_to_base64(img1)
            },

            {
                "type": "text",
                "text": "This is image #2"
            },
            {
                "type": "image_url",
                "image_url": image_to_base64(img2)
            },
        ],
    },
]

To which I received the “I’m sorry, I cannot assist with these requests.” response that others in the forum have gotten for different reasons

faizulhaque · May 20, 2024, 4:44pm

Facing the same issue, any luck finding the proper solution?

There is nothing related to ordering of response with calling with multiple images in input.
https://platform.openai.com/docs/guides/vision/multiple-image-inputs

ufku · October 26, 2024, 8:20pm

Sending a text message before each image seems to be working fine:

Main text message:

Analyze the attached images and select the best one for a finance site.

Return the results in JSON format using the following interface:
{
  images: {
    // The id of the analyzed image.
    "id": number;
    // Set this to true if it's the best image.
    "best_image": boolean;
    // One sentence feedback on why you chose the image.
    "feedback": string;
  }[];
}

Here is how i build the rest of the content in PHP:

foreach ($images as $id => $url) {
  $content[] = ['type' => 'text', 'text' => "ID for the next image: $id"];
  $content[] = ['type' => 'image_url', 'image_url' => ['url' => $url]];
}

Topic		Replies	Views
Api image/text order with gpt-4v API gpt-4 , gpt-4-vision	2	1137	March 22, 2024
Referring to multiple images in vision API API gpt-4	7	4153	October 26, 2024
Images input order with gpt-4 vision/omni API gpt-4-vision , gpt-4o	0	951	May 20, 2024
Does the order of items in content array affect the response with gpt4-vision API gpt-4 , gpt-4-vision	2	643	January 15, 2024
How to identify photos when batching for gpt 4 vision API	3	1506	March 18, 2024

GPT4-V: the order of multiple image inputs

Related topics