How do I get my API key to connect to Vision?

So I have a variety of accesses like gpt-4o etc but not vision. How do I get access to it?

I want to be able to upload images and extract contextual text data from them..

Welcome @playfulminds

gpt-4o has vision capability.

I’d recommend starting with the docs before writing code.

5 Likes

Thank you.

gpt-4o doesn’t seem to accept even a mid-sized image files, must be less than 70k. I was at 32k tokans 2k over the cap. I got it down to a 50k small file and it works 1x out of 10 times. But with the super compressed data even that 1 out 10 is ify on its extraction or thinking… 9 out of 10x GPT keeps telling me it is text only not images. gpt-4o-mini takes the larger files but never seems to do anything with the image.

I always use the same file.

9 out of 10

: “content”: “I cannot extract the details from the Base64-encoded image as I am unable to process image data directly. If you can provide a description or plaintext version of the information you would like to extract, I’ll be happy to assist you further.”,

1 out 10
“content”: “Here are the itemized Kaiser bills extracted from the provided image data:\n\n- Date: Not available for these items. \n- Amounts: $1,245.67, $987.34, $753.21, $1,346.79, $632.85, $1,112.50, $884.00, $1,506.38, $1,254.11, $863.72, $932.94, $1,480.22, $1,198.76, $1,370.59, $1,009.88, $716.14, $1,067.43, $1,423.57, $1,577.66, $1,900.30\n\nDates were not provided in the image data for the above amounts.”,

If I cut and paste these images into web-browser standard GPT, it works perfectly every time.

vision-preview seems to be deprecated. What is required to get to Vision?

Is there a way to get to vision or no?

The image is not sent to the AI model at the original file length, in base64, nor as a direct effect of the count of bytes. The file is encoded to image tokens the AI model can understand, in the range of 85 tokens when accompanied by `“detail”: “low”, up to 1445 tokens at something like 1600x768.

There also is no reason to send an image with the longest dimension greater than 2048 or the shortest dimension larger than 768 - the image will be downsized to those limits. You can save transmission time by resizing yourself, which may also reduce other issues.

Observe the “usage” object in your response, and see if the input token consumption aligns with the expected token use of image inputs being sent properly in a user message as a content part.

Are you sure that you’re sending image as an image content block not just passing the base64 encoded image as a string?

Here’s the boilerplate code from the docs:

import base64
from openai import OpenAI

client = OpenAI()

# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# Path to your image
image_path = "path_to_your_image.jpg"

# Getting the Base64 string
base64_image = encode_image(image_path)

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "what's in this image?" },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}",
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)