Realtime API Omni Modalities?

The realtime api says it is a ‘gpt-4o class model’. Is it capable of recieving images in the user messages like gpt-4o is? I would like to append frames from my video feed, but i noticed the ‘modalities’ parameter only accepts [‘text’, ‘audio’]. Any guidance is appreciated :slight_smile:

The other modalities have not yet been released. There will likely be a big announcement when they are available.

3 Likes

Thanks for the quick reply! I will keep an eye out.