Hi all,
I’m working on a proof-of-concept using the OpenAI Agents SDK where an agent can generate images using the ImageGenerationTool
or CodeInterpreterTool
.
My app stores previous messages in a database and rebuilds the conversation via a simple input field. I’m trying to reintroduce the generated image into the chat context so that a user can ask follow-up questions about it.
Here’s what I tried:
input_items = [ # Build using database
{
"role": "user",
"content": [ #ResponseInputTextParam
{"type": "input_text", "text": "Generate an image of a cat"},
],
},
{
"role": "assistant",
"content": [ #ResponseInputImageParam
{"type": "input_image", "detail": "low", "image_url": "BASE64CONTENT"},
],
},
{
"role": "user",
"content": [ #ResponseInputTextParam
{"type": "input_text", "text": "Change its feet to blue!"},
],
},
]
result = Runner.run_streamed(agent, input=input_items)
async for ev in result.stream_events():
...
But I get this error:
openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid value: 'input_image'. Supported values are: 'output_text' and 'refusal'.", 'type': 'invalid_request_error', 'param': 'input[1].content[1]', 'code': 'invalid_value'}}
I understand base64 isn’t ideal token-wise, but I’m fine with it for now as this is just a prototype.
My question:
- What’s the correct way to pass a generated image (e.g., base64 or hosted URL) back into the conversation history/context so it can be referenced in follow-up turns?
- Is there a special content type or encoding expected by the SDK for image re-ingestion?
- Or should I approach this differently?
Any guidance would be appreciated!
Thanks in advance.