Realtime model image input

Hi all,

I’m trying to input images to the gpt-realtime model, but I can’t figure out how the conversation.item.create JSON should look. I saw this code snippet from the release post (https://openai.com/index/introducing-gpt-realtime/):

{

"type": "conversation.item.create",

"previous_item_id": null,

"item": {

    "type": "message",

    "role": "user",

    "content": \[

        {

            "type": "input_image",

            "image_url": "data:image/{format(example: png)};base64,{some_base64_image_bytes}"

        }

    \]

}

}

But my WebRTC channel is unable to send the full base64-sized image properly. Can I send a storage URL instead, like from Vercel Blob/S3/GCS? I tried looking through the docs here (https://platform.openai.com/docs/api-reference/realtime_client_events/conversation/item/create) but it doesn’t cover image input for some reason.

Playground, via websocket

{"type":"conversation.item.create","item":{"type":"message","role":"user","content":[{"type":"input_text","text":"Cute huh!"},{"type":"input_image","image_url":"data:image/png;base64,iVBORw0KGgoAAAANSUh

The key name “image_url” implies that a hosted URL also might work.

Since it also follows the pattern of Responses’ “input_image”, you could also try file_id of a file uploaded to the OpenAI files endpoint with purpose: vision: