Realtime OpenAI Voice Bot Ignores Short Words Like “Hello” When Avoiding Cough/Noise

I’m using OpenAI Realtime API (gpt-realtime) + Twilio Media Streams to build a voice bot.

To stop false triggers from coughs or background noise, I enabled server VAD and added delay:

"turn_detection": {
  "type": "server_vad",
  "prefix_padding_ms": 300,
  "silence_threshold_ms": 1000,
  "threshold": 0.9
}

This solves the cough/ambiguous sounds
But now the bot completely ignores short speech like “hello”, “yes”, “no”

So:

  • If I increase VAD delay → short real speech is ignored

  • If I decrease delay → coughs/breathing trigger a response

Question:

How can I ignore noise/coughs but still detect short valid speech in real time?
Is there a better way to handle this? (Custom VAD, buffering, phoneme detection, etc.)

Context / Stack

Twilio → FastAPI WebSocket → OpenAI Realtime (Audio)
Using "gpt-realtime" model with streaming input/output