How do I get my API key to connect to Vision?

playfulminds · March 26, 2025, 5:04pm

So I have a variety of accesses like gpt-4o etc but not vision. How do I get access to it?

I want to be able to upload images and extract contextual text data from them..

sps · March 26, 2025, 5:24pm

Welcome @playfulminds

gpt-4o has vision capability.

I’d recommend starting with the docs before writing code.

playfulminds · March 26, 2025, 9:49pm

Thank you.

gpt-4o doesn’t seem to accept even a mid-sized image files, must be less than 70k. I was at 32k tokans 2k over the cap. I got it down to a 50k small file and it works 1x out of 10 times. But with the super compressed data even that 1 out 10 is ify on its extraction or thinking… 9 out of 10x GPT keeps telling me it is text only not images. gpt-4o-mini takes the larger files but never seems to do anything with the image.

I always use the same file.

9 out of 10

: “content”: “I cannot extract the details from the Base64-encoded image as I am unable to process image data directly. If you can provide a description or plaintext version of the information you would like to extract, I’ll be happy to assist you further.”,

1 out 10
“content”: “Here are the itemized Kaiser bills extracted from the provided image data:\n\n- Date: Not available for these items. \n- Amounts: $1,245.67, $987.34, $753.21, $1,346.79, $632.85, $1,112.50, $884.00, $1,506.38, $1,254.11, $863.72, $932.94, $1,480.22, $1,198.76, $1,370.59, $1,009.88, $716.14, $1,067.43, $1,423.57, $1,577.66, $1,900.30\n\nDates were not provided in the image data for the above amounts.”,

If I cut and paste these images into web-browser standard GPT, it works perfectly every time.

vision-preview seems to be deprecated. What is required to get to Vision?

Is there a way to get to vision or no?

_j · March 26, 2025, 10:26pm

The image is not sent to the AI model at the original file length, in base64, nor as a direct effect of the count of bytes. The file is encoded to image tokens the AI model can understand, in the range of 85 tokens when accompanied by `“detail”: “low”, up to 1445 tokens at something like 1600x768.

There also is no reason to send an image with the longest dimension greater than 2048 or the shortest dimension larger than 768 - the image will be downsized to those limits. You can save transmission time by resizing yourself, which may also reduce other issues.

Observe the “usage” object in your response, and see if the input token consumption aligns with the expected token use of image inputs being sent properly in a user message as a content part.

sps · March 27, 2025, 3:42am

Are you sure that you’re sending image as an image content block not just passing the base64 encoded image as a string?

Here’s the boilerplate code from the docs:

import base64
from openai import OpenAI

client = OpenAI()

# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# Path to your image
image_path = "path_to_your_image.jpg"

# Getting the Base64 string
base64_image = encode_image(image_path)

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "what's in this image?" },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}",
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

Topic		Replies	Views
Access to GPT4 vision API API api	7	4465	February 28, 2024
How to Use Vision Capabilities with GPT-4 via API? API	1	246	January 17, 2025
Can GPT -vision models be accessed using API? API	15	1690	January 7, 2025
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3967	December 6, 2023
Moving from gpt-4-vision-preview to gpt-4o Image URL Base64 API gpt-4 , api , gpt-4-vision	2	979	September 11, 2024

How do I get my API key to connect to Vision?

Related topics