Wrong type on image_url in chat completation api documentation

There is a mistake and mismatch in the API Documentation.
The image_url is exampled as a string, while this should be an object , { url: string } as per openAPI spec.

In the documentation:

{
    "type": "image_url",
    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
}

In the OpenAPI Spec:

{
    "type": "image_url",
    "image_url": {
        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
        },
}
4 Likes

It also mentions " Image input is only supported when using the gpt-4-visual-preview model." But this is not the case anymore. I believe both gpt-4 & gpt-4o support image input.

1 Like

Could you share the link to the relevant section in the documentation, please?

1 Like

API Reference - Create chat completion

2 Likes

There are multiple methods to pass images, and also methods exclusive within Assistants.

Take for example this Python code, which gives alternating strings or base64 image (with size and capability under your control) as user message contents, without JSON-like “type”:

image_paths = ["./img1.png", "./img2.png"]
file_names = [os.path.basename(path) for path in image_paths]

base64_images = []
for path in image_paths:
    with open(path, "rb") as image_file:
        base64_image = base64.b64encode(image_file.read()).decode('utf-8')
        base64_images.append(base64_image)

# Construct the content to alternate between file names and images
picture_list = []
for file_name, base64_image in zip(file_names, base64_images):
    picture_list.append(file_name)
    picture_list.append({"image": base64_image})

user = [{
    "role": "user",
    "content": ["Briefly describe image contents. Images:\n\n"] + picture_list
}]

Therefore, one must actually identify what actually produces an error when sent directly to the API endpoint before calling something “wrong”.

The incorrect non-existent model name has gone a long time without correction. The phrase should be “an AI model supporting computer vision”.

Yeah but the following API call, as shown in the API reference, does throw an error and is also inconsistent with the documentation in other places:

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                },
            ],
        }
    ],
    max_tokens=300,
)

print(response.choices[0])

Error message

{'error': {'message': "Invalid type for 'messages[0].content[1].image_url': expected an object, but got a string instead.", 'type': 'invalid_request_error', 'param': 'messages[0].content[1].image_url', 'code': 'invalid_type'}

Since when was gpt-4-turbo capable of vision? Aren’t you hitting a model limitation there?

Try the same call with gpt-4o

A few months ago. I forgot exact timing (edit: it obviously must have been April 9). Either way, error also occurs with gpt-4o with above script. On the flipside, the calls go through just fine when including the image url as an object.

image

2 Likes

Shoulda checked thanks! :slight_smile:

2 Likes

Works just dandy. The problem is the poor validation by the API Python library.

import requests, os, json
body = {
    "model": "gpt-4-vision-preview",
    "max_tokens": 100,
    "top_p": 1e-7,
    "temperature": 1e-7,
    "messages": [
        {
            "role": "user",
            "content": [
                "what's in the image?",
                {
                    "type": "image_url",
                    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Starship_SN16.jpeg/397px-Starship_SN16.jpeg",
                },
            ],
        }
    ],
}
apikey = os.environ.get('OPENAI_API_KEY')
headers = {
    "Authorization": f"Bearer {apikey}",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:125.0) Gecko/20100101 Firefox/126.0",
    "Accept": "*/*",
    "OpenAI-Beta": "assistants=v2",
}
url = "https://api.openai.com/v1/chat/completions"
response = requests.post(url, headers=headers, json=body)
if response.status_code != 200:
    print(f"HTTP error {response.status_code}: {response.text}")
else:
    print(json.dumps(response.json(), indent=3))

response:

{
“id”: “chatcmpl-1234”,
“object”: “chat.completion”,
“created”: 1718724899,
“model”: “gpt-4-1106-vision-preview”,
“choices”: [
{
“index”: 0,
“message”: {
“role”: “assistant”,
“content”: “The image shows a large, metallic rocket standing vertically on what appears to be a launch or test stand. This is the SpaceX Starship, a fully reusable spacecraft designed for missions to Earth orbit, Mars, and beyond. The rocket has a distinctive shiny, stainless steel exterior and features aerodynamic flaps near the top and bottom. The structure is quite tall and cylindrical, with a conical nose section. The environment suggests it could be at a SpaceX facility, likely in preparation for testing or as part”
},
“logprobs”: null,
“finish_reason”: “length”
}
],…

It is refusal by gpt-4o that is the problem. Inconsistent API by OpenAI.

Yes, but you are testing it with the gpt-4-vision-preview model. The same call does not go through with gpt-4-turbo (which has vision capabilities) and gpt-4o.

The main point is that the example by OpenAI provided in the API specs is misleading and results in an error and therefore would benefit from an update and/or greater clarification.

1 Like

The solution then: before you stick a new vision model on the API, ensure you also read the existing documentation for the endpoint to ensure application portability, OpenAI.

2 Likes

It seems a bit messy to me indeed

A current solution is to check the OpenAI OpenApi-Spec (<- check for the P :stuck_out_tongue: ).

The OpenAPI spec is used to build the OpenAI API, so any information in there is valid.

You can find the OpenAi OpenAPI Spec here: github → openai/openai-openapi

One of the great benefits of OpenAPI (previously known as Swagger) specs is, that they allow developers to update the spec and keep API and the Documentation in sync automatically :upside_down_face: