GPT 4 Vision API - Detail param and token cost?

I assume that docs and other things are not accurate or finished yet?

Anyone had any luck being able to send in the detail parameter referred here?

Anyone figured out how to get token cost back too?

2 Likes

This part has costing calculation

I’m able to send the detail parameter like below

{
    "type": "image_url", 
    "image_url": {
		"url": the image url,
		"detail": "high"
	}
}

Thanks.

Using their sample code gives me an error :frowning:

from openai import OpenAI
import os
from dotenv import load_dotenv

load_dotenv()

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What’s in this image?"},
                {
                    "type": "image_url",
                    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                    "detail": "high"
                },
            ],
        }
    ],
    max_tokens=300,
)
# Display response headers

print(response.choices[0])

Error code: 400 - {'error': {'message': 'Invalid chat format. Unexpected keys in a message content image dict.', 'type': 'invalid_request_error', 'param': None, 'code': None}}

On calculation I did look at it earlier and tried to get an estimate. but was hoping for a way to return exact cost in response.

1 Like

I also had success using detail within the image_url parameter.
Have you tried updating the python library for OpenAI, looks like it was updated 11hrs ago.

Yup, updated the library. Etc.
I was able to get the basic example working in Postman, using the API endpoint with this payload


{
    "model": "gpt-4-vision-preview",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What’s in this image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                        "detail": "low"
                    }
                }
            ]
        }
    ]
}

and get the token usage in my python script by using
print(response.usage)

But anything i tried so far with detail while using Openai python library gives me errors.

1 Like

You’re getting the error “Unexpected keys in a message content image dict” because the structure expected by the API is a dictionary within a dictionary, while your structure is a flat dictionary.

You’re placing the “detail”: “low” within the same dictionary as the “type”: “image_url” like this, but the API expects the image_url key to map to another dictionary that contains both the url and detail keys. This is why the error stated there were unexpected keys in the message content image dict; it wasn’t expecting detail to be at the same level as type.

The correct implementation nests the url and detail within another dictionary, assigned to the image_url key.

I fixed it for you and now it works. :slight_smile:


from openai import OpenAI
import os

client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What’s in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                        "detail": "low"
                    }
                },
            ],
        }
    ],
    max_tokens=300,
)

# Display response headers
print(response.choices[0])

Don’t forget to reimplement dotenv back in again though! I left it out cause I ran above code to verify if it’s working, and I don’t use dotenv myself since I put my key into my system settings as per these instructions (see the “Setup your API key for all projects” section) and use os.getenv

1 Like

Thanks, I actually figured it out earlier.

here is sample code if anyone is interested.

def generate_description(frames, detail_level):
    print("Encoding frames...")
    base64_frames = [
        base64.b64encode(cv2.imencode(".jpg", frame)[1]).decode("utf-8") 
        for frame in frames
    ]

    print("Preparing data URIs...")
    data_uris = [
        f"data:image/jpeg;base64,{frame}" for frame in base64_frames
    ]

    image_dicts = [
        {
            "type": "image_url",
            "image_url": {
                "url": data_uri,
                "detail": detail_level
            }
        }
        for data_uri in data_uris
    ]

    prompt_messages = [
        {
            "role": "user",
            "content": [
                "These are frames from a video that I want to upload. Generate a compelling description that I can upload along with the video.",
                *image_dicts,
            ],
        },
    ]

    params = {
        "model": "gpt-4-vision-preview",
        "messages": prompt_messages,
        "max_tokens": 500,
    }