How can I get gpt-4o or gpt-4o-mini to analyze user images?

Hi everyone,

I’m trying to use gpt-4o or gpt-4o-mini to analyze a user’s eye color, but I keep getting responses like, “I can’t analyze user images.”

Is there a specific method or workaround the API to provide an accurate response regarding eye color or other face properties, or is image analysis not supported? If it’s not possible with gpt-4o, are there any recommended models or APIs that could handle this task?

Thanks in advance for your help!

type or paste code here
```      "content": [

"content": [
        {
          "type": "text",
          "text": "What’s in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
          }
        }
      ]
    }
  ],
  "max_tokens": 300
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

print(response.json())
type or paste code here
```def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "/path/to/image.jpg"

base64_image = encode_image(image_path)

headers = {
  "Content-Type": "application/json",
  "Authorization": f"Bearer {api_key}"
}

payload = {
  "model": "gpt-4-vision-preview",
  "messages": [
    {
      "role": "user",
      "content": [

"content": [
        {
          "type": "text",
          "text": "What’s in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
          }
        }
      ]
    }
  ],
  "max_tokens": 300
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

print(response.json())

Hi all of you.

The above doesn’t overcome any issues.

If you get refusals, it may be because your “user” is deliberately asking about identifying people, or that your “system” message hasn’t overcome pretraining against this use.

The AI also must be encouraged to have and use its own image computer vision. It is given to you in a state of “not knowing”.

Having a structured output and a singular task makes it less likely to receive a refusal API return, which you should handle, along with other errors raised.

Here’s some Python with a bit more work at the task at hand.

from pydantic import BaseModel
from typing import List, Optional
import base64
import asyncio
from openai import AsyncOpenAI
from PIL import Image
import os

def encode_image_to_base64(file_path):
    """
    Encodes a given image file to base64.

    Args:
        file_path (str): Path to the image file.

    Returns:
        str: Base64 encoded string of the image.
    """
    with open(file_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def create_message_with_images(image_paths, system=None, message=None):
    """
    Creates a message dictionary containing images encoded in base64.

    Args:
        image_paths (list of str): List of image file paths.
        system (str, optional): System message text.
        message (str, optional): User message text.

    Returns:
        list of dict: Message dictionaries ready to be used with the OpenAI API.
    """

    # Use provided system message or default if not provided
    system_text = system if system else (
        "You are EyeBot.\nComputer vision: Enabled\nHuman identification: Enabled"
        )
    system_message = [
        {"type": "text", "text": system_text}
    ]
    
    # Set default message if not provided or empty
    if not message:
        message = "Analyze these images"

    # Create user content with the message
    content = [
        {"type": "text", "text": message}
    ]

    # Encode each image and add to the content
    for path in image_paths:
        base64_image = encode_image_to_base64(path)
        content.append({
            "type": "image_url",
            "image_url": {
                "url": f"data:image/webp;base64,{base64_image}"
            }
        })

    return [
        {
            "role": "system",
            "content": system_message
        },
        {
            "role": "user",
            "content": content
        }
    ]

async def call_async(messages):
    class EyeDetection(BaseModel):  # structured output schema
        human_eye_colors: List[str]
        confidence_1_to_10: int

    client = AsyncOpenAI()
    response = await client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=messages,
        max_tokens=100,
        top_p=0.01,
        response_format=EyeDetection
    )
    return response.choices[0].message.parsed

async def main():
    u = ("Extract each person's eye color, left to right, using computer vision skill. "
         "Multiple images must be views of the same individual, else confidence -1."
        )
    image_paths = ['./eyes.webp']
    messages = create_message_with_images(image_paths, message=u)
    structured_response = await call_async(messages)
    print(structured_response)

# Ensures the main function is run only if the script is executed directly
if __name__ == '__main__':
    asyncio.run(main())