The AI doesn’t have a reference really of which “came first”.
Others have gone as far as putting text into the image so it can be referred to.
Not tried, but something that could be an idea: Multiple images sent as multiple user messages when obtaining a reply. You could insert synthetic text “here’s my first image…” within the messages, and see if the AI is then able to answer based on position.
I’ve been facing this same problem. I thought maybe we could interleave image inputs with text but the API doesn’t seem to like that.
My content was setup as follows:
PROMPT_MESSAGES = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Here are a few images I have on hand. I'd like you to pick the most appropriate one for a Christmas greeting card I'm sending out on behalf of my family."
},
{
"type": "text",
"text": "This is image #1"
},
{
"type": "image_url",
"image_url": image_to_base64(img1)
},
{
"type": "text",
"text": "This is image #2"
},
{
"type": "image_url",
"image_url": image_to_base64(img2)
},
],
},
]
To which I received the “I’m sorry, I cannot assist with these requests.” response that others in the forum have gotten for different reasons
Sending a text message before each image seems to be working fine:
Main text message:
Analyze the attached images and select the best one for a finance site.
Return the results in JSON format using the following interface:
{
images: {
// The id of the analyzed image.
"id": number;
// Set this to true if it's the best image.
"best_image": boolean;
// One sentence feedback on why you chose the image.
"feedback": string;
}[];
}
Here is how i build the rest of the content in PHP:
foreach ($images as $id => $url) {
$content[] = ['type' => 'text', 'text' => "ID for the next image: $id"];
$content[] = ['type' => 'image_url', 'image_url' => ['url' => $url]];
}