That functionality is working by my side.
First I’m creating both an Assistant (with gpt-4o) and a Thread (empty).
Next, I’m hitting this endpoint:
POST https://api.openai.com/v1/threads/{threadId}/messages
Body:
{
"role": "user",
"content": [
{
"type": "text",
"text": "Do you see any similarity or difference between the attached images?"
},
{
"type": "image_file",
"image_file": {
"file_id": "file-Vl3rOrhupx0MFHwsVjvjva4S",
"detail": "low"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/e/eb/Machu_Picchu%2C_Peru.jpg",
"detail": "low"
}
}
]
}
Then, I’m hitting this other endpoint:
POST: https://api.openai.com/v1/threads/{threadId}/runs
Body:
{
"assistant_id": "asst_kfoHDe4qk8JwqOwtM9PdROJN",
"stream": true
}
And I’m receiving this response in chunks of text:
Both images appear to be identical, depicting the iconic site of Machu Picchu in Peru. Machu Picchu is a 15th-century Inca citadel located in the Eastern Cordillera of southern Peru. It is set on a mountain ridge and is situated approximately 2,430 meters (7,970 feet) above sea level.
Similarities:
1. Both images show the same panoramic view of the Machu Picchu archaeological site.
2. The notable features such as the terraces, stone structures, and the prominent Huayna Picchu mountain in the background are visible in both images.
Since the images are the same, there are no differences to point out. This preserved Incan site demonstrates remarkable engineering and architectural skills, and it's a significant cultural heritage of the Inca civilization.
Just in case, I’m using the simple-openai library.