Vision unable to read image_url in model gpt-4-turbo, but it can in gpt-4-vision-preview?

{
"model": "gpt-4-turbo",
"max_tokens": "300",
"messages": 
	[
		{
			"role" : "user", 
			"content" : 
			[
				{
					"type" : "text",
					"text" : "what are the main colors in this image"
				},
       			{
      				"type" : "image_url",
      				"image_url" : 
      					{ 
      						"url" : "https://usvisabook.me/google-maps/clinic_images/Bangkok_Vadhana/040db5a6-fa9f-11ee-a39f-005056bf0cdf_8.jpg",
      						"detail" : "low"
      					}
      			}
      		]
    	}				

	]
}

I get response:

{
  "error": {
    "message": "Invalid image URL: 'messages[0].content[1].image_url.url'. Expected a base64-encoded data URL with an image MIME type (e.g. 'data:image/png;base64,aW1nIGJ5dGVzIGhlcmU='), but got a value without the 'data:' prefix.",
    "type": "invalid_request_error",
    "param": "messages[0].content[1].image_url.url",
    "code": "invalid_value"
  }
}

all other models return the same, until gpt-4-vision-preview and gpt-4-1106-vision-preview:

{
  "id": "chatcmpl-9NMnBARXtbQXz5cIGjO9slIXnCkQg",
  "object": "chat.completion",
  "created": 1715356185,
  "model": "gpt-4-1106-vision-preview",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The image appears to be an advertisement for a skin or beauty treatment service, possibly from a clinic or a beauty center. It features a woman with clear skin smiling and looking upwards, which suggests a sense of satisfaction or positive results after treatment. The imagery of what looks like a healthcare professional's hands performing a procedure on a client's skin might be indicative of the personalized care and services offered.\n\nThe text includes prices, suggesting that there is a promotional offer or a standard rate for the treatments. Unfortunately, I can't provide translations or personal opinions, but it is clear that the advertisement is targeted at people interested in skincare or cosmetic treatments, emphasizing affordability and professional service. There's also a date mentioned which could indicate a promotional period or a validity date for the offer. Additionally, a phone number is provided, presumably for potential clients to make appointments or inquire further about the services."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 95,
    "completion_tokens": 175,
    "total_tokens": 270
  },
  "system_fingerprint": null
}

I would almost use it as a workaround, but seems 4-vision-preview ignores any instructions and explains only whats in the image.

If I feed this image to chatGPT it’s much more detailed.

Anybody seeing this?

What’s your… “text” : “prompt#” … set as?

1 Like

i edited the “prompt” to be: “what are the main colors in this image” - still shows only the description of the image.

Try to change “max_tokens”: “300” to “max_tokens”: 300
The max_tokens value must be integer. Otherwise, this error comes up.

2 Likes