Image_url for gpt-4o api giving error "expected an object, but got a string instead.",

I ran the exact code given in the documentation for vision api.

But I got the below error. Same applies to gpt-4-turbo model. Am I missing something?

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid type for 'messages[0].content[1].image_url': expected an object, but got a string instead.", 'type': 'invalid_request_error', 'param': 'messages[0].content[1].image_url', 'code': 'invalid_type'}}

Here is the code:


response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                },
            ],
        }
    ],
    max_tokens=300,
)

print(response.choices[0])
2 Likes

Your code appears to be missing the object structure under the “image_url” key.
The correct format should look like this:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What’s in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                        "detail": "high"
                    },
                },
            ],
        }
    ],
    max_tokens=300,
)
print(response.choices[0].message.content)

The “detail” key can also be set to “low,” but it appears it cannot be omitted.

I often make this mistake too, and it can be a bit confusing.
I hope this helps you even a little🙂

3 Likes

Thank you! that works.
But I ran into a new problem when using AWS S3 presigned URL which gave the same error. will explore further.

I get the same error when using s3 resigned URL with gpt-4o or gpt-4-turbo. No error with gpt-4-vision-preview. OP, have you found a solution?

Hi @ebobr , For me, the problem got solved when I did not include max_tokens in the request. But this is not an actual solution. Its a workaround we found. The problem with S3 bucket URLs still exist.

1 Like

J’ai eu la meme erreur. La documentation de l’api est erronĂ©e , elle correspond au model gpt-4-vision-preview :
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
model=“gpt-4-turbo”,
messages=[
{
“role”: “user”,
“content”: [
{“type”: “text”, “text”: “What’s in this image?”},
{
“type”: “image_url”,
“image_url”: “https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg”,
},
],
}
],
max_tokens=300,
)

print(response.choices[0])

Yes, it seems that the documentation isn’t always correct. Please refer to the method I mentioned above.

Le format des données renvoyé ne correspond plus à du json, voici un autre extrait de la documentation openai:

from openai import OpenAI
client = OpenAI()

audio_file = open(“speech.mp3”, “rb”)
transcript = client.audio.transcriptions.create(
model=“whisper-1”,
file=audio_file
)
{
“text”: “Imagine the wildest idea that you’ve ever had, and you’re curious about how it might scale to something that’s a 100, a 1,000 times bigger. This is a place where you can get to do that.”
}
Pour accĂ©der Ă  la rĂ©ponse, auparavant on Ă©crivait transcript[‘text’] (comme pour un dictionnaire) mais maintenant il faut Ă©crire transcript.text : cela correspond plutot Ă  l’attribut d’un objet d’une classe, bref on a rĂ©cupĂšre un objet et non du json 

J’ai l’impression que lors de la derniĂšre grosse mise Ă  jour d’openai l’annĂ©e derniĂšre , avec le passage de l’écriture du style openai.Audio.translate Ă  client.audio.transcriptions.create , la mise Ă  jour de la documentation n’a pas suivie et a fait un mixte des deux 


this solution save my day using GPT-4o , plus converting max_tokens to integer. Thank you

2 Likes

Thanks a bunch. Works like a charm. They should really updated the documentation over here:
https://platform.openai.com/docs/api-reference/chat/create

I’m not sure if you took a look at OpenAI’s GitHub sample, but what you’re suggesting doesn’t seem to be correct:

There are multiple methods to pass images to vision models, some support depending on which model you are using, and no grid of which model has which API message methods.

I apologize, my explanation was not accurate


It should have been stated that there needs to be a ‘url’ key within the object corresponding to the ‘image_url’ key, and this ‘url’ key should contain the actual URL.

By doing this, it will function correctly.

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
            "type": "image_url",
            "image_url": {
                "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
            },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

The ‘detail’ key is not necessarily required, and if omitted, it defaults to ‘auto’, letting the model decide whether to set it to ‘low’ or ‘high’.

Sorry for inaccurate explanation