Realtime API Omni Modalities?

rdswearingen · October 10, 2024, 4:49am

The realtime api says it is a ‘gpt-4o class model’. Is it capable of recieving images in the user messages like gpt-4o is? I would like to append frames from my video feed, but i noticed the ‘modalities’ parameter only accepts [‘text’, ‘audio’]. Any guidance is appreciated

anon22939549 · October 10, 2024, 5:03am

The other modalities have not yet been released. There will likely be a big announcement when they are available.

rdswearingen · October 10, 2024, 5:13am

Thanks for the quick reply! I will keep an eye out.

Topic		Replies	Views
Adding multimodal support to API API	1	611	October 6, 2023
Is There an API for ChatGPT’s Video Chat (Advanced Voice Mode)? API video , api , advanced-voice	1	532	February 6, 2025
GPT-4 API multimodal access (images) API	8	13541	July 2, 2024
Image input for GPT-4 (and related docs) API	1	2346	March 28, 2023
ChatGPT can do Q&A on images, but did not find this feature in API API	2	1579	January 31, 2024

Realtime API Omni Modalities?

Related topics