GPT4-V: the order of multiple image inputs

GPT-V can process multiple image inputs, but can it differentiate the order of the images? Take the following messages as an example.

from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
  model="gpt-4-vision-preview",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What's in the first image? What's in the second image?",
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "imageA.jpg",
          },
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "imageB.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)
print(response.choices[0])

Will imageA be considered as the first image because it appears above imageB in the messages?

The AI doesn’t have a reference really of which “came first”.

Others have gone as far as putting text into the image so it can be referred to.

Not tried, but something that could be an idea: Multiple images sent as multiple user messages when obtaining a reply. You could insert synthetic text “here’s my first image…” within the messages, and see if the AI is then able to answer based on position.

I’ve been facing this same problem. I thought maybe we could interleave image inputs with text but the API doesn’t seem to like that.

My content was setup as follows:

PROMPT_MESSAGES = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Here are a few images I have on hand. I'd like you to pick the most appropriate one for a Christmas greeting card I'm sending out on behalf of my family."
            },

            {
                "type": "text",
                "text": "This is image #1"
            },
            {
                "type": "image_url",
                "image_url": image_to_base64(img1)
            },

            {
                "type": "text",
                "text": "This is image #2"
            },
            {
                "type": "image_url",
                "image_url": image_to_base64(img2)
            },
        ],
    },
]

To which I received the “I’m sorry, I cannot assist with these requests.” response that others in the forum have gotten for different reasons