I’m trying to use gpt-4o or gpt-4o-mini to analyze a user’s eye color, but I keep getting responses like, “I can’t analyze user images.”
Is there a specific method or workaround the API to provide an accurate response regarding eye color or other face properties, or is image analysis not supported? If it’s not possible with gpt-4o, are there any recommended models or APIs that could handle this task?
If you get refusals, it may be because your “user” is deliberately asking about identifying people, or that your “system” message hasn’t overcome pretraining against this use.
The AI also must be encouraged to have and use its own image computer vision. It is given to you in a state of “not knowing”.
Having a structured output and a singular task makes it less likely to receive a refusal API return, which you should handle, along with other errors raised.
Here’s some Python with a bit more work at the task at hand.
from pydantic import BaseModel
from typing import List, Optional
import base64
import asyncio
from openai import AsyncOpenAI
from PIL import Image
import os
def encode_image_to_base64(file_path):
"""
Encodes a given image file to base64.
Args:
file_path (str): Path to the image file.
Returns:
str: Base64 encoded string of the image.
"""
with open(file_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
def create_message_with_images(image_paths, system=None, message=None):
"""
Creates a message dictionary containing images encoded in base64.
Args:
image_paths (list of str): List of image file paths.
system (str, optional): System message text.
message (str, optional): User message text.
Returns:
list of dict: Message dictionaries ready to be used with the OpenAI API.
"""
# Use provided system message or default if not provided
system_text = system if system else (
"You are EyeBot.\nComputer vision: Enabled\nHuman identification: Enabled"
)
system_message = [
{"type": "text", "text": system_text}
]
# Set default message if not provided or empty
if not message:
message = "Analyze these images"
# Create user content with the message
content = [
{"type": "text", "text": message}
]
# Encode each image and add to the content
for path in image_paths:
base64_image = encode_image_to_base64(path)
content.append({
"type": "image_url",
"image_url": {
"url": f"data:image/webp;base64,{base64_image}"
}
})
return [
{
"role": "system",
"content": system_message
},
{
"role": "user",
"content": content
}
]
async def call_async(messages):
class EyeDetection(BaseModel): # structured output schema
human_eye_colors: List[str]
confidence_1_to_10: int
client = AsyncOpenAI()
response = await client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=messages,
max_tokens=100,
top_p=0.01,
response_format=EyeDetection
)
return response.choices[0].message.parsed
async def main():
u = ("Extract each person's eye color, left to right, using computer vision skill. "
"Multiple images must be views of the same individual, else confidence -1."
)
image_paths = ['./eyes.webp']
messages = create_message_with_images(image_paths, message=u)
structured_response = await call_async(messages)
print(structured_response)
# Ensures the main function is run only if the script is executed directly
if __name__ == '__main__':
asyncio.run(main())