Image_url for gpt-4o api giving error "expected an object, but got a string instead.",

prem.s · May 14, 2024, 1:28pm

I ran the exact code given in the documentation for vision api.

But I got the below error. Same applies to gpt-4-turbo model. Am I missing something?

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid type for 'messages[0].content[1].image_url': expected an object, but got a string instead.", 'type': 'invalid_request_error', 'param': 'messages[0].content[1].image_url', 'code': 'invalid_type'}}

Here is the code:


response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                },
            ],
        }
    ],
    max_tokens=300,
)

print(response.choices[0])

dignity_for_all · May 14, 2024, 3:24pm

Your code appears to be missing the object structure under the “image_url” key.
The correct format should look like this:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What’s in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                        "detail": "high"
                    },
                },
            ],
        }
    ],
    max_tokens=300,
)
print(response.choices[0].message.content)

The “detail” key can also be set to “low,” but it appears it cannot be omitted.

I often make this mistake too, and it can be a bit confusing.
I hope this helps you even a little🙂

prem.s · May 15, 2024, 11:56am

Thank you! that works.
But I ran into a new problem when using AWS S3 presigned URL which gave the same error. will explore further.

ebobr · May 17, 2024, 2:47am

I get the same error when using s3 resigned URL with gpt-4o or gpt-4-turbo. No error with gpt-4-vision-preview. OP, have you found a solution?

prem.s · May 17, 2024, 3:14am

Hi @ebobr , For me, the problem got solved when I did not include max_tokens in the request. But this is not an actual solution. Its a workaround we found. The problem with S3 bucket URLs still exist.

bertrand.hoareau · May 20, 2024, 7:29am

J’ai eu la meme erreur. La documentation de l’api est erronée , elle correspond au model gpt-4-vision-preview :
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
model=“gpt-4-turbo”,
messages=[
{
“role”: “user”,
“content”: [
{“type”: “text”, “text”: “What’s in this image?”},
{
“type”: “image_url”,
“image_url”: “https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg”,
},
],
}
],
max_tokens=300,
)

print(response.choices[0])

dignity_for_all · May 20, 2024, 8:25am

Yes, it seems that the documentation isn’t always correct. Please refer to the method I mentioned above.

bertrand.hoareau · May 20, 2024, 9:04am

Le format des données renvoyé ne correspond plus à du json, voici un autre extrait de la documentation openai:

from openai import OpenAI
client = OpenAI()

audio_file = open(“speech.mp3”, “rb”)
transcript = client.audio.transcriptions.create(
model=“whisper-1”,
file=audio_file
)
{
“text”: “Imagine the wildest idea that you’ve ever had, and you’re curious about how it might scale to something that’s a 100, a 1,000 times bigger. This is a place where you can get to do that.”
}
Pour accéder à la réponse, auparavant on écrivait transcript[‘text’] (comme pour un dictionnaire) mais maintenant il faut écrire transcript.text : cela correspond plutot à l’attribut d’un objet d’une classe, bref on a récupère un objet et non du json …
J’ai l’impression que lors de la dernière grosse mise à jour d’openai l’année dernière , avec le passage de l’écriture du style openai.Audio.translate à client.audio.transcriptions.create , la mise à jour de la documentation n’a pas suivie et a fait un mixte des deux …

tomkat_cr · May 20, 2024, 10:41pm

this solution save my day using GPT-4o , plus converting max_tokens to integer. Thank you

hello97 · May 31, 2024, 3:06am

Thanks a bunch. Works like a charm. They should really updated the documentation over here:
https://platform.openai.com/docs/api-reference/chat/create

iganapolsky · June 30, 2024, 6:12pm

I’m not sure if you took a look at OpenAI’s GitHub sample, but what you’re suggesting doesn’t seem to be correct:

github.com

openai/openai-cookbook/blob/main/examples/Parse_PDF_docs_for_RAG.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f6ce3d79",
   "metadata": {},
   "source": [
    "# Parsing PDF documents for RAG applications\n",
    "\n",
    "This notebook shows how to leverage GPT-4V to turn rich PDF documents such as slide decks or exports from web pages into usable content for your RAG application.\n",
    "\n",
    "This technique can be used if you have a lot of unstructured data containing valuable information that you want to be able to retrieve as part of your RAG pipeline.\n",
    "\n",
    "For example, you could build a Knowledge Assistant that could answer user queries about your company or product based on information contained in PDF documents. \n",
    "\n",
    "The example documents used in this notebook are located at [data/example_pdfs](data/example_pdfs). They are related to OpenAI's APIs and various techniques that can be used as part of LLM projects."
   ]
  },
  {
   "cell_type": "markdown",

This file has been truncated. show original

_j · June 30, 2024, 6:51pm

There are multiple methods to pass images to vision models, some support depending on which model you are using, and no grid of which model has which API message methods.

dignity_for_all · July 1, 2024, 10:15am

I apologize, my explanation was not accurate…

It should have been stated that there needs to be a ‘url’ key within the object corresponding to the ‘image_url’ key, and this ‘url’ key should contain the actual URL.

By doing this, it will function correctly.

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
            "type": "image_url",
            "image_url": {
                "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
            },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

The ‘detail’ key is not necessarily required, and if omitted, it defaults to ‘auto’, letting the model decide whether to set it to ‘low’ or ‘high’.

Sorry for inaccurate explanation…

github.com

openai/openai-python/blob/main/src/openai/types/chat/chat_completion_content_part_image_param.py#L10


      
          # File generated from our OpenAPI spec by Stainless. See CONTRIBUTING.md for details.
          
          from __future__ import annotations
          
          from typing_extensions import Literal, Required, TypedDict
          
          __all__ = ["ChatCompletionContentPartImageParam", "ImageURL"]
          
          
          class ImageURL(TypedDict, total=False):
              url: Required[str]
              """Either a URL of the image or the base64 encoded image data."""
          
              detail: Literal["auto", "low", "high"]
              """Specifies the detail level of the image.
          
              Learn more in the
              [Vision guide](https://platform.openai.com/docs/guides/vision/low-or-high-fidelity-image-understanding).
              """

Topic		Replies	Views
Wrong type on image_url in chat completation api documentation Documentation api , feedback , documentation	12	1167	June 18, 2024
Gpt-4-turbo-2024-04-09 not accepting images for Node js? Bugs	10	1715	August 6, 2024
GPT-4-Turbo not responding to image url input Bugs gpt-4 , bug , api , gpt-4-vision , gpt-4-turbo	4	802	April 10, 2024
GPT-4-turbo vision API recognizes image_url as base64 encoded image data Bugs gpt-4-vision	10	5254	April 2, 2025
Vision unable to read image_url in model gpt-4-turbo, but it can in gpt-4-vision-preview? API chatgpt , gpt-4-vision , gpt-4-turbo	3	1261	May 17, 2024

Image_url for gpt-4o api giving error "expected an object, but got a string instead.",

Related topics