Is realtime api directly speech to speech?

_j · January 13, 2025, 3:27am

The only “text conversion” is providing you a transcript of the output. This uses a separate transcription service for audio to text.

There is conversion: wav audio to a tokenized spectral audio version for understanding (but not text), and the reverse codec for output. This is proprietary.

Topic		Replies	Views
Multiple API calls - high latency; options / product suggestion API chatgpt	21	3310	December 25, 2023
What will the GPT-4o audio API look like? API audio , gpt-4o	9	3766	October 2, 2024
Issues with GPT-4o-transcribe API API realtime	9	1250	April 25, 2025
Streaming from Text-to-Speech api API api , python , tts	53	50906	January 21, 2025
RealtimeAPI: WebRTC (Client) + WebSocket (Server) possible? API realtime	12	425	February 23, 2025

Is realtime api directly speech to speech?

Related topics