Realtime API Images as input

Hello, I am trying to use the Realtime API (WebSocket) to analyze some images. I assume (hope) the main issue is syntax, as documentation seems limited.

I am trying to send each message as follows:

event = {
“type”: “response.create”,
“response”: {
“modalities”: [“text”, “image”],
“instructions”: message,
“images”: frames
}
}

ws.send(json.dumps(event))

However, it is not working as the on_message function isn’t event being called. It all works fine without images. Is this the right syntax? Or is there a different way to feed images? If anyone has found some way to feed images to the Realtime API I would really like to hear about your approach.

1 Like