Hi,
I’m using the Realtime API for a voice agent with Twilio Media Streams.
My goal is to use Realtime only for STT + TTS, while all dialog logic is handled externally (LangChain / backend).
Question:
Is there a supported / recommended way to run Realtime in a strict TTS-only mode, where the model never generates autonomous conversational responses, and only produces audio when explicitly triggered via response.create?