Hi all -
I’ve successfully developed an AI interviewer that has a specific interview guide that it goes through with a user in a short interview format. Our goal is to build a natural conversational style interview so Realtime API works decently well for this with vad/end of turn detection.
The issue is that it is a bit glitchy sometimes. I have explicit instructions for the model in the prompt which it adheres to (most of the time). Sometimes it will go a bit off the rails and exposes the instructions to the user. Other times, it will awkwardly start speaking over the user. And at times it will not adhere to the instruction to wait for the user to speak.
Has anyone had consistent success with prompting the model with realtime API to do this natural conversational flow type of interaction? If so, any guidance would be great. Or if there’s another method thats better suited for this type of application would love any guidance there. I previously had implemented a STT > LLM > TTS flow but we really want the end of turn detection so the conversation flows rather than user having to PTT.
Thank you!