Wrong type on image_url in chat completation api documentation

el-ko · June 18, 2024, 10:01am

There is a mistake and mismatch in the API Documentation.
The image_url is exampled as a string, while this should be an object , { url: string } as per openAPI spec.

In the documentation:

{
    "type": "image_url",
    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
}

In the OpenAPI Spec:

{
    "type": "image_url",
    "image_url": {
        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
        },
}

el-ko · June 18, 2024, 10:04am

It also mentions " Image input is only supported when using the gpt-4-visual-preview model." But this is not the case anymore. I believe both gpt-4 & gpt-4o support image input.

jr.2509 · June 18, 2024, 10:19am

el-ko:

In the documentation:

{
    "type": "image_url",
    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
}

Could you share the link to the relevant section in the documentation, please?

el-ko · June 18, 2024, 10:26am

API Reference - Create chat completion

_j · June 18, 2024, 3:09pm

There are multiple methods to pass images, and also methods exclusive within Assistants.

Take for example this Python code, which gives alternating strings or base64 image (with size and capability under your control) as user message contents, without JSON-like “type”:

image_paths = ["./img1.png", "./img2.png"]
file_names = [os.path.basename(path) for path in image_paths]

base64_images = []
for path in image_paths:
    with open(path, "rb") as image_file:
        base64_image = base64.b64encode(image_file.read()).decode('utf-8')
        base64_images.append(base64_image)

# Construct the content to alternate between file names and images
picture_list = []
for file_name, base64_image in zip(file_names, base64_images):
    picture_list.append(file_name)
    picture_list.append({"image": base64_image})

user = [{
    "role": "user",
    "content": ["Briefly describe image contents. Images:\n\n"] + picture_list
}]

Therefore, one must actually identify what actually produces an error when sent directly to the API endpoint before calling something “wrong”.

The incorrect non-existent model name has gone a long time without correction. The phrase should be “an AI model supporting computer vision”.

jr.2509 · June 18, 2024, 3:18pm

Yeah but the following API call, as shown in the API reference, does throw an error and is also inconsistent with the documentation in other places:

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                },
            ],
        }
    ],
    max_tokens=300,
)

print(response.choices[0])

Error message

{'error': {'message': "Invalid type for 'messages[0].content[1].image_url': expected an object, but got a string instead.", 'type': 'invalid_request_error', 'param': 'messages[0].content[1].image_url', 'code': 'invalid_type'}

merefield · June 18, 2024, 3:25pm

Since when was gpt-4-turbo capable of vision? Aren’t you hitting a model limitation there?

Try the same call with gpt-4o

jr.2509 · June 18, 2024, 3:27pm

A few months ago. I forgot exact timing (edit: it obviously must have been April 9). Either way, error also occurs with gpt-4o with above script. On the flipside, the calls go through just fine when including the image url as an object.

merefield · June 18, 2024, 3:27pm

Shoulda checked thanks!

_j · June 18, 2024, 3:35pm

Works just dandy. The problem is the poor validation by the API Python library.

import requests, os, json
body = {
    "model": "gpt-4-vision-preview",
    "max_tokens": 100,
    "top_p": 1e-7,
    "temperature": 1e-7,
    "messages": [
        {
            "role": "user",
            "content": [
                "what's in the image?",
                {
                    "type": "image_url",
                    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Starship_SN16.jpeg/397px-Starship_SN16.jpeg",
                },
            ],
        }
    ],
}
apikey = os.environ.get('OPENAI_API_KEY')
headers = {
    "Authorization": f"Bearer {apikey}",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:125.0) Gecko/20100101 Firefox/126.0",
    "Accept": "*/*",
    "OpenAI-Beta": "assistants=v2",
}
url = "https://api.openai.com/v1/chat/completions"
response = requests.post(url, headers=headers, json=body)
if response.status_code != 200:
    print(f"HTTP error {response.status_code}: {response.text}")
else:
    print(json.dumps(response.json(), indent=3))

response:

{
“id”: “chatcmpl-1234”,
“object”: “chat.completion”,
“created”: 1718724899,
“model”: “gpt-4-1106-vision-preview”,
“choices”: [
{
“index”: 0,
“message”: {
“role”: “assistant”,
“content”: “The image shows a large, metallic rocket standing vertically on what appears to be a launch or test stand. This is the SpaceX Starship, a fully reusable spacecraft designed for missions to Earth orbit, Mars, and beyond. The rocket has a distinctive shiny, stainless steel exterior and features aerodynamic flaps near the top and bottom. The structure is quite tall and cylindrical, with a conical nose section. The environment suggests it could be at a SpaceX facility, likely in preparation for testing or as part”
},
“logprobs”: null,
“finish_reason”: “length”
}
],…

It is refusal by gpt-4o that is the problem. Inconsistent API by OpenAI.

jr.2509 · June 18, 2024, 3:39pm

Yes, but you are testing it with the gpt-4-vision-preview model. The same call does not go through with gpt-4-turbo (which has vision capabilities) and gpt-4o.

The main point is that the example by OpenAI provided in the API specs is misleading and results in an error and therefore would benefit from an update and/or greater clarification.

_j · June 18, 2024, 3:44pm

The solution then: before you stick a new vision model on the API, ensure you also read the existing documentation for the endpoint to ensure application portability, OpenAI.

el-ko · June 18, 2024, 7:14pm

It seems a bit messy to me indeed

A current solution is to check the OpenAI OpenApi-Spec (<- check for the P ).

The OpenAPI spec is used to build the OpenAI API, so any information in there is valid.

You can find the OpenAi OpenAPI Spec here: github → openai/openai-openapi

One of the great benefits of OpenAPI (previously known as Swagger) specs is, that they allow developers to update the spec and keep API and the Documentation in sync automatically

Topic		Replies	Views
Image_url for gpt-4o api giving error "expected an object, but got a string instead.", Bugs gpt-4 , api	12	13605	July 1, 2024
Error on api endpoint examples of chat-image input Documentation chatgpt , api	3	601	February 19, 2024
Cannot send image_url to gpt-4o API	4	4300	October 2, 2024
API reference has many "impossible" parameters in chat code examples Documentation api , chat-completion	0	256	February 3, 2025
Gpt-4-turbo-2024-04-09 not accepting images for Node js? Bugs	10	1744	August 6, 2024

Wrong type on image_url in chat completation api documentation

Related topics