Unexpected token length for vision

Hi folks,
I’m being charged about 40000 tokens per image. Here is my code, could anyone please help?

messages = [
    {
        "role": "system",
        "content": [
            {
                "type": "text",
                "text": "what is this document about?",
            }
        ],
    },
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "data:image/jpeg;base64,/9j/4AAQ......",
                },
            }
        ],
    },
]

Hi @marco.lai.c.l :wave:

Welcome to the dev forum.

Can you share what indicates this?

2 Likes

Hello @marco.lai.c.l

Welcome to the Community! I suggest providing the URL or the image object directly instead of using base64 encoding. Base64 encoding converts the image into a large string, which significantly increases the number of tokens processed.

As shown here:

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

Or if you have to proide the base_64 please use the

image_path

and as you are using

image_url

That is why you are being charged this much

Please see the referene below for base_64

import base64
import requests

# OpenAI API Key
api_key = "YOUR_OPENAI_API_KEY"

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "path_to_your_image.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)

headers = {
  "Content-Type": "application/json",
  "Authorization": f"Bearer {api_key}"
}

payload = {
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What’s in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
          }
        }
      ]
    }
  ],
  "max_tokens": 300
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

print(response.json())

I hope that this helps: Vision - OpenAI API

That is absolutely not how it works.

The number of tokens is determined by the base tile and the number of high detail tiles used for the image size at the given detail parameter.

You would only have ridulous token counts if you were not sending the image correctly as an object as part of a content array as specified in the API reference. Then the AI wouldn’t be able to see it anyway.

1 Like

I see thank you for the correction but as I just saw the base_64 and the image_url being used I though that was why.

your code looks okay so my guess is you might be probably sending this to a non-vision model. although on second thought, you never mentioned any error.