Is it possible to run evals with image input and string output?

Hi there everyone.
I’m a newbie in working with llms. I would like to make an evaluation of the model given as input an image and a prompt and then getting as output just a word. In particular i would like to string-check that word so that it should match a “correct_label”.
I’ve tried a few options and the best i got launching the eval was a server error (500).
I’ve tried contacting help.openai.com and they said it’s technically possible but now I’m really not sure. I know there is already a topic where it says it’s not possible with “gpt_4o_mini” but i don’t really care about what model i need to use, I’m really open on that. Can you help me?
Attached I’ll leave my code as well as the jsonl file i used.

url = "https://api.openai.com/v1/evals"

headers = {
    "Authorization": f"Bearer {openai.api_key}",
    "Content-Type": "application/json"
}

data = {
    "name": "Image + Correct label",
    "data_source_config": {
        "type": "custom",
        "item_schema": {
            "type": "object",
            "properties": {
                "input": { "type": "image_url" },              
                "correct_label": { "type": "string" }        # the expected one-word label
            },
            "required": ["input", "correct_label"]
        },
        "include_sample_schema": True
    },
    "testing_criteria": [
        {
            "type": "string_check",
            "name": "Match output to label",
            "input": "{{ sample.output_text }}",
            "operation": "eq",
            "reference": "{{ item.correct_label }}"
        }
    ]
}
{"item": {"input":{"url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRvqPTmHSPQNv6z3FGXMV6aexJWQ4EjwkQGWQ&s"}, "correct_label": "Dog"}}
{"item": {"input":{"url": "https://i.natgeofe.com/n/548467d8-c5f1-4551-9f58-6817a8d2c45e/NationalGeographic_2572187_16x9.jpg?w=1200"}, "correct_label": "Cat"}}
{"item": {"input":{"url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSon1zCtSmEcmvdpCKR3gV41mS5frDNLqvX2w&s"}, "correct_label": "Car"}}
{"item": {"input":{"url": "https://media.istockphoto.com/id/1364917563/it/foto/uomo-daffari-sorridente-con-le-braccia-incrociate-su-sfondo-bianco.jpg?s=612x612&w=0&k=20&c=fxEx8bGP-UfpVRwdmX_mxQIs2E0ojhxw1bxHcB_ltzs="}, "correct_label": "Man"}}
eval_id = "correct eval id as defined before"
file_id = "correct json file id as defined before"

url = f"https://api.openai.com/v1/evals/{eval_id}/runs"

headers = {
    "Authorization": f"Bearer {openai.api_key}",
    "Content-Type": "application/json"
}


data = {
    "name": "Image + Label Test Run ",
    "data_source": {
        "type": "completions",
        "model": ENGINE,
        "input_messages": {
            "type": "template",
            "template": [
                {
                    "role": "developer",
                    "content": (
                        "You are an expert on recognizing what an image is about. "
                        "Respond with just one word representing the main item in the image. "
                        "Don't use commas or dots."
                    )
                },
                {
                    "role": "user",
                    "content": "{{item.input}}"
                }
            ]
        },
        "source": {
            "type": "file_id",
            "id": file_id
        }
    }
}

resp = requests.post(url, headers=headers, data=json.dumps(data))
print(resp.status_code)
print(resp.json())

ERROR:
500
{‘error’: {‘message’: ‘The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID req_f22ff81123b920e8ba4de860140cd437 in your email.)’, ‘type’: ‘server_error’, ‘param’: None, ‘code’: None}}

It would be perfect if someone corrected my code in some way. It’s for a school project and I’m really getting frustrated hahah
Thank you guys!

1 Like

Welcome to the community, @denis.camilotto!

Seems like you’re off to a good start. I plugged it into gpt-o4-mini-high and got this…

It turns out that the 500 you’re seeing almost always means that your /evals/{eval_id}/runs payload doesn’t exactly match the shape the API is expecting. In your case the culprit is the data_source.input_messages block—and in particular how you’re plugging the image URL in.

Here’s what you need to change:

  1. Make each message an object with a "type": "message"
  2. Wrap text in a content-object with "type": "text" and "text" , not just a bare string.
  3. Feed the image via a content-object of type "image_url" with a "url" field pointing at {{item.input.url}} .

Below is a minimal, corrected example in Python (using requests ) that should work. I’ve annotated the key bits.


import os, json, requests

openai_key = os.getenv("OPENAI_API_KEY")
eval_id    = "eval-your-eval-id"
file_id    = "file-your-jsonl-file-id"
MODEL = "gpt-4o"   # instead of "gpt-4o-mini"

url = f"https://api.openai.com/v1/evals/{eval_id}/runs"

headers = {
    "Authorization": f"Bearer {openai_key}",
    "Content-Type": "application/json"
}

data = {
    "name": "Image + Label Test Run",
    "data_source": {
        "type": "completions",
        "model": MODEL,
        # 1) Source: point to your uploaded JSONL
        "source": {
            "type": "file_id",
            "id": file_id
        },
        # 2) input_messages: a template of messages
        "input_messages": {
            "type": "template",
            "template": [
                {
                    "type": "message",
                    "role": "system",
                    "content": {
                        "type": "text",
                        "text": (
                            "You are an expert on recognizing what an image is about. "
                            "Respond with just one word representing the main item in the image. "
                            "Don't use commas or dots."
                        )
                    }
                },
                {
                    "type": "message",
                    "role": "user",
                    "content": {
                        "type": "image_url",
                        "url": "{{item.input.url}}"
                    }
                }
            ]
        }
        # sampling_params can be omitted or set to null/{} if you don’t need to override defaults
    }
}

resp = requests.post(url, headers=headers, json=data)
print(resp.status_code, resp.text)

What changed?

  • Messages now include "type": "message" so the API knows you’re sending chat-style messages
  • Text is wrapped as {"type":"text","text":...} instead of a raw string
  • Image is wrapped as {"type":"image_url","url":"{{item.input.url}}"} , not just {{item.input}} , so the model can load it

Give that a spin and you should see your runs queue up (no more 500). If you still hit an error, capture the full response body and the request ID it returns and include that when you ping support—but in almost every case it’s a small shape issue in the input_messages .

Can you give that a try and let us know?

I tried and no solution…

400
{‘error’: {‘message’: “Invalid value: ‘text’. Value must be ‘input_text’.”, ‘type’: ‘invalid_request_error’, ‘param’: ‘data_source.input_messages.template[0].content.type’, ‘code’: ‘invalid_value’}}

I iteratively gave ChatGPT the errors but it brought me nowhere unfortunately.

1 Like

Yeah, I was trying to cheat and help you the easy way haha… Sorry! :sweat_smile:

I’ve not run evals myself, so hopefully someone smarter will pop into the thread.

You need first store your output using store: true .
Once stored, you need to check if images will be store like other text completions, as far as I noticed it won’t be store.

1 Like

Hi!

You’re pretty close. That data_source (completions data source) doesn’t template in the image like you currently have.
To make this work, you’ll want to make the completion call yourself, and then upload the finished docs to the evals framework.

This cookbook (https://cookbook.openai.com/examples/evaluation/use-cases/regression) should match your use-case pretty closely!

You’ll want to build a “file_content” object by looping over your image url inputs, making a completions call, and then uploading the “response.model_dump()”.
You’ll use the jsonl data source instead of the completions one.

This section is the one you’ll want to follow:

run_data = []
for push_notifications in push_notification_data:
    result = summarize_push_notification(push_notifications)
    run_data.append({
        "item": PushNotifications(notifications=push_notifications).model_dump(),
        "sample": result.model_dump()
    })

eval_run_result = openai.evals.runs.create(
    eval_id=eval_id,
    name="baseline-run",
    data_source={
        "type": "jsonl",
        "source": {
            "type": "file_content",
            "content": run_data,
        }
    },
)
print(eval_run_result)
# Check out the results in the UI
print(eval_run_result.report_url)
5 Likes

Humans > ChatGPT for coding once again! :wink:

Although it could get interesting this week.

Regardless, thanks for the assist!

Thank you, I’ll try this tomorrow and let you know!

Edit: it worked someway

1 Like