GPT4-V: the order of multiple image inputs

I’ve been facing this same problem. I thought maybe we could interleave image inputs with text but the API doesn’t seem to like that.

My content was setup as follows:

PROMPT_MESSAGES = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Here are a few images I have on hand. I'd like you to pick the most appropriate one for a Christmas greeting card I'm sending out on behalf of my family."
            },

            {
                "type": "text",
                "text": "This is image #1"
            },
            {
                "type": "image_url",
                "image_url": image_to_base64(img1)
            },

            {
                "type": "text",
                "text": "This is image #2"
            },
            {
                "type": "image_url",
                "image_url": image_to_base64(img2)
            },
        ],
    },
]

To which I received the “I’m sorry, I cannot assist with these requests.” response that others in the forum have gotten for different reasons