Realtime API: How to use it as TTS-only (no conversational responses)?

Manuel_Popescu · February 2, 2026, 1:26pm

Hi,

I’m using the Realtime API for a voice agent with Twilio Media Streams.
My goal is to use Realtime only for STT + TTS, while all dialog logic is handled externally (LangChain / backend).

Question:
Is there a supported / recommended way to run Realtime in a strict TTS-only mode, where the model never generates autonomous conversational responses, and only produces audio when explicitly triggered via response.create?

Topic		Replies	Views
Realtime: Calling 3rd party API API realtime	0	159	October 7, 2024
Realtime API Audio Modality output API realtime , api-realtime , api-realtime-speech	7	1204	December 13, 2024
Multimodal/realtime API - audio to text output, not transccription API api , multimodal	2	236	April 20, 2025
How to get text only output from the Realtime API? API api , realtime	14	5315	June 20, 2025
Suppressing agent voice response? API realtime , api-realtime , api-realtime-speech , agents-sdk	0	147	June 14, 2025

Realtime API: How to use it as TTS-only (no conversational responses)?

Related topics