One-word answers like 'yes' and 'no' are unreliably spoken

laugums · September 14, 2025, 8:09pm

Hi all,

Firstly credit to the developers for the GA version of Realtime, it’s brilliant and a huge step forward. Also thanks for Hello Realtime which is super helpful.

I have noticed an issue I am putting a workaround in place for.

On Realtime Audio one word spoken answers like yes or no are not reliably spoken despite being visibly shown as spoken in the logs. I have found they are dropped around 30% of the time, and my sense is that the issue is worsened if the model has just been speaking full sentences in the proceeding discussion. This is an issue for my use case where sometimes I need the model to say yes or no.

Tested via the SIP route, it should be possible to recreate the issue with this prompt:

const INSTRUCTIONS = `

Repeat exactly what the user says, back to them.

`;

User: The sky is blue and clouds are white and fluffy.
Bot: The sky is blue and clouds are white and fluffy.
User: My marine animal is magenta in color.
Bot: My marine animal is magenta in color.
User: Yes
Bot: (no response)
User: No
Bot: (no response)
It isn’t reliable so you may get a response but I find it’s inconsistent. You will note that it does appear in the transcript.

I have come up with a workaround which is to adapt the prompt to say if yes, say “The user said ‘yes’”, and this seems to avoid the issue.

Thank you

vb · September 14, 2025, 9:08pm

Hi and welcome to the community!

I’ve noticed the same thing across many TTS models, and I usually use a similar workaround—like enforcing a two-word minimum per reply.

Off the top of my head, another option might be to pre-record short words like “yes” or “no” for quick playback instead of relying on voice mode. Of course, it depends on your use case.

Fixing this behavior would definitely be a great improvement for the next generation of models.

Topic		Replies	Views
Realtime API recognizing very short words - WebSockets API realtime	2	348	August 28, 2025
Realtime API: Assistant misunderstands even simple sentences API realtime	1	174	December 9, 2024
Huge problems with TTS API Bugs tts	5	2485	March 10, 2026
Stick to prompt - realtime voice api API	0	90	June 6, 2025
[Realtime API] gpt-4o-realtime-preview models audio downtime (no output) on Jun 2 API realtime , api-realtime	8	784	July 2, 2025

One-word answers like 'yes' and 'no' are unreliably spoken

Related topics