{
"model": "gpt-4-turbo",
"max_tokens": "300",
"messages":
[
{
"role" : "user",
"content" :
[
{
"type" : "text",
"text" : "what are the main colors in this image"
},
{
"type" : "image_url",
"image_url" :
{
"url" : "https://usvisabook.me/google-maps/clinic_images/Bangkok_Vadhana/040db5a6-fa9f-11ee-a39f-005056bf0cdf_8.jpg",
"detail" : "low"
}
}
]
}
]
}
I get response:
{
"error": {
"message": "Invalid image URL: 'messages[0].content[1].image_url.url'. Expected a base64-encoded data URL with an image MIME type (e.g. 'data:image/png;base64,aW1nIGJ5dGVzIGhlcmU='), but got a value without the 'data:' prefix.",
"type": "invalid_request_error",
"param": "messages[0].content[1].image_url.url",
"code": "invalid_value"
}
}
all other models return the same, until gpt-4-vision-preview and gpt-4-1106-vision-preview:
{
"id": "chatcmpl-9NMnBARXtbQXz5cIGjO9slIXnCkQg",
"object": "chat.completion",
"created": 1715356185,
"model": "gpt-4-1106-vision-preview",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The image appears to be an advertisement for a skin or beauty treatment service, possibly from a clinic or a beauty center. It features a woman with clear skin smiling and looking upwards, which suggests a sense of satisfaction or positive results after treatment. The imagery of what looks like a healthcare professional's hands performing a procedure on a client's skin might be indicative of the personalized care and services offered.\n\nThe text includes prices, suggesting that there is a promotional offer or a standard rate for the treatments. Unfortunately, I can't provide translations or personal opinions, but it is clear that the advertisement is targeted at people interested in skincare or cosmetic treatments, emphasizing affordability and professional service. There's also a date mentioned which could indicate a promotional period or a validity date for the offer. Additionally, a phone number is provided, presumably for potential clients to make appointments or inquire further about the services."
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 95,
"completion_tokens": 175,
"total_tokens": 270
},
"system_fingerprint": null
}
I would almost use it as a workaround, but seems 4-vision-preview ignores any instructions and explains only whats in the image.
If I feed this image to chatGPT it’s much more detailed.
Anybody seeing this?