Object color extraction from images in gpt4o?

Hi everyone,

I have been interested in extracting color data from images using gpt4o-vision. I’m currently sandboxing through the chatgpt mac app.

I of course find that gpt4o does a great job of describing colors in an image. For example, in the screenshot below, I ask “what is the color of the star in this image” and it correctly and succintly answers “light purple.”

However, if I just slightly modify my question to ask “what is the RGB color value of the star in the image,” GPT pivots to a python analysis (pasted below), which fails to even capture the star.

I’m curious about what is so different under the hood between what gpt4o is doing in its first (accurate) response, versus what it’s doing in its second, clunky response that wraps python and fails to locate the star. If it uses the first, non-python approach, does that preclude it from extracting literal values from the image like RGB color data, and why? Is that approach also precluded from extracting other information, like location information (e.g. getting a correct description for “what pixel values contain the star” so I could pivot to another program to extract the colors of these pixels).

I don’t have an AI background and just use it as a tool to help with my biology research. Results like this leave me wondering about options for gpt4o’s use in downstream color extraction. Clearly, it can recognize objects in photos and report their categorical colors quite accurately. This is hugely helpful for me! But obviously my next goal is to extract color values from the pixels underlying objects more precisely. This of course is possible without gpt by using point-and-click approaches, but I really value the capacity of gpt to automate the process across really big datasets (removing the manual pointing and clicking part of object detection and data extraction).

Thanks so much!

GPT python code (failed to capture star and yielded 255,255,255):

from PIL import Image

# Load the image
image_path = "/mnt/data/star.jpg"
image = Image.open(image_path)

# Convert image to RGB
rgb_image = image.convert("RGB")

# Let's define an area around the star to sample the color
# For simplicity, we'll sample a small region in the center of the star
star_area = (430, 180, 470, 220)  # Assuming coordinates around the star

# Crop the star area
cropped_image = rgb_image.crop(star_area)

# Get the RGB values by averaging the color in this area
pixels = list(cropped_image.getdata())
average_rgb = tuple(sum(x)//len(x) for x in zip(*pixels))

average_rgb