Batch API with multimodal input

Hello,

I was trying to submit a batch job for multimodal request with a text and an image. Since the documentation for batch jobs is only single modality (https://platform.openai.com/docs/guides/batch#1-prepare-your-batch-file), I assumed I could use the format for Images and Vision (https://platform.openai.com/docs/guides/images?api-mode=responses&format=base64-encoded) to generate something like:

{
“custom_id”: id,
“method”: “POST”,
“url”: “/v1/responses”,
“body”: {
“model”: model,
“messages”: [{
“role”: “user”, “content”: [
{“type”: “input_text”,
“text”: prompt},
{“type”: “input_image”,
“url”: f"data:image/png;base64,{base64_image}"}
]}
}
}

However, I got the following error:
{“error”: {“message”: "Invalid value: ‘input_image’. Supported values are: ‘text’, ‘image_url’, ‘input_audio’, ‘refusal’, ‘audio’, and ‘file’.}

However. upon changing the content type to image_url I got the following error which I could not resolve:
{“error”: {“message”: “Invalid type for ‘messages[0].content[1].image_url’: expected an object, but got a string instead.”}

Does anyone have experience with submitting multimodal (text + image) batched jobs to the API?

2 Likes

Welcome to the dev community @Aylin_Akkus

The vision docs for responses API mention this structure:

{
    "type": "input_image",
    "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
    "detail": "high",
}

I can see that in the sample request shared by you, ”url” is being used instead of “image_url"

4 Likes

Вот минимальный и рабочий шаблон batch.jsonl для отправки мультимодального (текст + изображение) запроса через OpenAI Batch API:


batch.jsonl:

{
“custom_id”: “image_test_001”,
“method”: “POST”,
“url”: “/v1/chat/completions”,
“body”: {
“model”: “gpt-4-vision-preview”,
“messages”: [
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “Что изображено на картинке?”
},
{
“type”: “image_url”,
“image_url”: {
“url”: “…”
}
}
]
}
],
“max_tokens”: 1000
}
}


Что тебе нужно сделать:

  1. Заменить data:image/png;base64,… на фактическое base64-энкодированное изображение.

Убедись, что нет лишних пробелов или переносов строк.

Если большое изображение — желательно использовать .webp или уменьшенное .jpg для оптимизации.

1 Like