GPT-4o Vision help Image input not working

aarushnangunoori · June 17, 2024, 2:12pm

I need my program to take a screenshot of the screen and then upload that into gpt4(I would like to use gpt4o if possible) , search the internet and then give me a response based off what its sees. But I cant seem to figure out the input for vision. Below is my code and the error that appears . Help would be appreciated

import keyboard
import pyautogui
import requests
import pyperclip
import io
import base64

def take_screenshot_and_process():
    # Take a screenshot
    screenshot = pyautogui.screenshot()
    buffer = io.BytesIO()
    screenshot.save(buffer, format='PNG')
    image_base64 = base64.b64encode(buffer.getvalue()).decode('utf-8')

    # Prepare the API request
    headers = {
        'Authorization': 'Bearer key'
        'Content-Type': 'application/json',
    }
    data = {
        "prompt": "Describe this image",
        "model": "gpt-4",
        "n": 1,
        "size": "1920x1080",
        "image": image_base64
    }
    
    # Send the request
    response = requests.post('https://api.openai.com/v1/images/generations', json=data, headers=headers)
    response_data = response.json()

trenton.dambrowitz · June 17, 2024, 2:50pm

There’s a few issues with how you’re currently trying to call the API, I highly recommend reading the documentation since it clears up a lot of confusion.

That said, below is a good starting point that you can use.


    headers = {
        'Content-Type': 'application/json',
        'Authorization': f"Bearer {os.environ.get('OPENAI_API_KEY')}"
    }

    print("sending request")
    payload = {
        'model': 'gpt-4o',
        'messages': [
            {
                'role': 'system',
                'content': 'You are a helpful assistant. You will fulfil the user's requests to the best of your ability.'
            },
            {
                'role': 'user',
                'content': [
                    {'type': 'text', 'text': 'Describe this image'},
                    {'type': 'image_url', 'image_url': {'url': f'data:image/jpeg;base64,{image_base64}'}}
                ]
            }
        ],
        'max_tokens': 800
    }


    response = requests.post('https://api.openai.com/v1/chat/completions', headers=headers, json=payload)

    text = response.json()['choices'][0]['message']['content']
    print(text)

Topic		Replies	Views
GPT says it cannot read images API gpt-4-vision	1	308	October 7, 2024
I am having trouble accessing "analysing an image" feature in gpt-4o API gpt-4	2	152	August 13, 2024
Different errors when inputting an image into gpt-4-vision-preview API gpt-4 , api	1	1397	January 17, 2024
Image to text description API gpt-4 , api	5	1989	October 27, 2023
Using an image as input gpt4 api API	3	16754	June 3, 2024

GPT-4o Vision help Image input not working

Related topics