Did you input the image through API or the web interface?

At the moment, it is only allowed via the web interface.

2 Likes

The image description was generated by Bing chat. Which is available now and the only thing required is to spoof the user-agent string of the Edge browser and you get the gpt-4 computer vision free from billion-dollar investor Microsoft. You can quickly see that having AI language generated about an image that you can just look at yourself has limited long-term appeal or application. It’s not the machine vision that screens a thousand apples a minute.

1 Like

Does anyone know what the $13 billion cash infusion from Microsoft has brought to OpenAI?

More servers, more people, bigger and prettier headquarters?

More money to pay their incredible Azure bill… to Microsoft?

1 Like

hey, psst :wink: Don’t make it too obvious haha.

2 Likes

82tfau

2 Likes

1 Like

Hey, we urgently need the ‘Image to Text’ feature. Does anyone know when it will be available via the ChatGPT API, or the Azure ChatGPT API?

Probably. Are they telling? No.

I asked GPT4 to guess what the API call would look like it said:

As of my last training data in January 2022, OpenAI hadn’t publicly released a specific “ChatGPT-4” API. My training data includes information up to ChatGPT-3, and there was an API for that.

However, let’s assume the API would be somewhat similar to ChatGPT-3’s. In the case of ChatGPT-3, the API call would be something like:

python

import requests

def call_chatgpt4_to_transcribe(image_path):
    url = "https://api.openai.com/v1/engines/chatgpt-4.0-turbo/completions"  # Assuming hypothetical endpoint

    headers = {
        'Authorization': 'Bearer YOUR_OPENAI_API_KEY',
        'Content-Type': 'application/json',
    }

    data = {
        "prompt": "Transcribe the following image:",  # This is just an example; the actual API might require direct image input
        "image": image_path  # Hypothetical, since ChatGPT-3 doesn't support image input directly
    }

    response = requests.post(url, headers=headers, json=data)
    result = response.json()

    # Assuming the transcribed text is available under the 'choices' key
    return result['choices'][0]['text']

No, the AI can’t answer in any meaningful way.

You can expect when the API is turned on, that role message “content” schema will also take a list (array) type instead of just a string.

Array elements can then be the normal string of a prompt, or a dictionary (json) with a key of the data type “image” and bytestream encoded image data as the value.

Okay, let’s consider other APIs with file input, it will not be just path. Based on description by people with access to ChatGPT with image analysis enabled, it seems you can upload several images. From other post, it seems under Chat completions API…

const completion = await openai.chat.completions.create({
    messages: [
      { role: "system", content: "You are a helpful assistant." }, 
     { role: "user", content: "Here are two pictures of vegetables I took from the supermarket, which one will be good for curry?" }
],
    files: [
        fs.createReadStream("IMG0001.png"),
        fs.createReadStream("IMG0002.png")
    ],
    model: "gpt-4-vision",
  });

Of course, these are just wild guesses, for science! Can’t wait until DevDay when they unveil everything.

1 Like

Thank you, supershaneski and fred4! We will try!

1 Like

How did this site become Reddit, where you simply just make your nonsense look plausible?
You will: try supplying a “prompt” parameter to a nonexistent chat model through an “engines” endpoint and fail… :smile:

If you like code:

import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")
image = b"..."  # binary image stream
completion = openai.ChatCompletion.create(
    model="gpt-4-0xxx",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant and can describe images.",
        },
        {
            "role": "user",
            "content": ["What's in this screenshot?", {"image": image}],
        },
    ],
)
print(completion["choices"][0]["message"]["content"])

… as demonstrated seven months ago

1 Like

{
“role”: “user”,
“content”: [“What’s in this screenshot?”, {“image”: image}],
},

This makes sense as I cannot imagine how to put the image in context with separate file input parameter. It definitely needs to be attached in the messages!

1 Like

What if we have GPT-4-32K (Enterprise)? Could we use the API for a vision task?

The only “enterprise” offering by OpenAI that would mention 32k is ChatGPT Enterprise. It is not “the API”, it is a business-oriented version of ChatGPT, with centralized billing and user management to the web interface chatbot.

2 Likes

Hi,
could you share the code because i am using the api key of gpt4 and it is working on text but i want to know how it will work on image.

See gist dot github dot com/abrichr/259b83b72fe0055d4f27ca9b3d387bc7 for a script that queries the OpenAI API with images and text.

Reproduced here:

"""Example usage of GPT4-V API.

Usage:

    OPENAI_API_KEY=<your_api_key> python3 gpt4v.py \
        [<path/to/image1.png>] [<path/to/image2.jpg>] [...] "text prompt"

Example:

    OPENAI_API_KEY=xxx python3 gpt4v.py photo.png "What's in this photo?"
"""

from pprint import pprint
import base64
import json
import mimetypes
import os
import requests
import sys


api_key = os.getenv("OPENAI_API_KEY")


def encode_image(image_path: str):
    """Encodes an image to base64 and determines the correct MIME type."""
    mime_type, _ = mimetypes.guess_type(image_path)
    if mime_type is None:
        raise ValueError(f"Cannot determine MIME type for {image_path}")

    with open(image_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
        return f"data:{mime_type};base64,{encoded_string}"


def create_payload(images: list[str], prompt: str, model="gpt-4-vision-preview", max_tokens=100, detail="high"):
    """Creates the payload for the API request."""
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt,
                },
            ],
        },
    ]

    for image in images:
        base64_image = encode_image(image)
        messages[0]["content"].append({
            "type": "image_url",
            "image_url": {
                "url": base64_image,
                "detail": detail,
            }
        })

    return {
        "model": model,
        "messages": messages,
        "max_tokens": max_tokens
    }


def query_openai(payload):
    """Sends a request to the OpenAI API and prints the response."""
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
    return response.json()


def main():
    if len(sys.argv) < 3:
        print("Usage: python script.py [image1.jpg] [image2.png] ... \"Text Prompt\"")
        sys.exit(1)

    # All arguments except the last one are image paths
    image_paths = sys.argv[1:-1]

    # The last argument is the text prompt
    prompt = sys.argv[-1]

    payload = create_payload(image_paths, prompt)
    response = query_openai(payload)
    pprint(response)


if __name__ == "__main__":
    main()