I need some clarification regarding the Vision API. According to the documentation, an image with “low detail” should use only 85 tokens. However, when I run the command below, I’m seeing approximately 305 prompt_tokens in the response.
Is this behavior expected?
curl --location ‘https://api.openai.com/v1/chat/completions’
–header ‘Authorization: ’
–header ‘Content-Type: application/json’
–data ‘{
“model”: “gpt-4o”,
“messages”: [
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “Describe image”
},
{
“type”: “image_url”,
“image_url”: {
“url”: “https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg”,
“detail”: “low”
}
}
]
}
]
}’