Hi everyone,
First, I will provide some context. I’m using the real time API to create an agent that will do questions/answers. The user provides a set of questions with a context for each question (e.g., “Skip the next question if…”). The agent then asks these questions to another user. This is strictly a speech-to-speech setup; there’s no chat interface involved for the user answering the questions.
So far, I’ve put all of these questions and instructions into the agent’s base instructions before starting him. I realize I could move the questions into a chat message, but it currently works as-is. If you have any thoughts on this, I’d be happy to hear them.
![]()
My main concern is about function calling. In my initial use case, once the agent has asked all its questions, I want it to call a function automatically. I’ve specified this in the agent’s instructions, but nothing happens unless the user explicitly asks the agent to call that function. In other words, I need the agent itself to decide when to invoke a function based on his action.
(e.g., When he finishes all the questions, when he has a media URL in a question, …)
My workaround has been to tell the agent to include a special keyword that he pronounce silently in the transcript. I then parse the transcript, and whenever I see the keyword, I trigger the function. However, I’d prefer a more direct or elegant approach if one is available.
Is there a better way to achieve this, please ?
Best,
Géry