Is There an API for ChatGPT’s Video Chat (Advanced Voice Mode)?

Hi everyone,

I’m wondering if OpenAI has released or plans to release an API that supports the video chat capabilities found in ChatGPT’s Advanced Voice Mode. I know the Realtime API handles real-time audio and text, but is there any roadmap or current support for video input/output via API?

Any insights or updates would be appreciated. Thanks!

1 Like

“Video chat” is actually by providing a low sample rate stream of images.

The realtime endpoint doesn’t support images:

You can implement it with Chat Completions, however:

The audio latency is reduced by not needing a separate transcription of audio input, nor TTS for output.

However, image processing requires more time before output generation begins. I would use detail:low on images from video at maximum 512px.

You also have to build your own voice activity detection if not simply making a record/send button.

Further insights: not a peep.