Hi everyone!
I’m facing a specific issue while using the realtime API. When the model calls a function (tool) that takes several seconds to generate a result, the user might speak again in the meantime. This can lead to overlapping responses if the tool’s result arrives at the same time as the model’s new response.
Here’s a breakdown of the issue:
- t1 → The user asks a question.
- t2 → The model calls a tool.
- t3 → The tool starts executing.
- t4 → The user speaks again.
- t5 → The model generates a response.
- t6 → The tool’s result arrives, and the model tries to generate another response (which fails because it was already responding).
My Current Solution
I decided to ignore user input while a tool is executing, only processing it once the tool completes.
Pros:
- Prevents overlapping responses.
- Avoids mid-conversation interruptions due to delayed tool results.
Cons:
- This is not how a real conversation works. In a phone call, a user can ask something else while waiting.
- It creates awkward silences and long wait times.
- The user has no idea that my application isn’t listening while the tool is running.
A Partial Improvement
If a tool takes more than X seconds, I tell the model to generate a response like:
“I’m still retrieving the information, please wait a moment.”
Then, I re-trigger the same tool call with identical parameters. This way, when my app receives the re-triggered request, it knows a tool was already running, waits another X seconds (configurable timeout), and only generates the final response once the tool completes.
Key Considerations
- I understand that tools should ideally be micro-tasks that execute quickly, but in reality, some computations or external API calls take time.
- I tried adding a pre-call message (e.g., “Let me check that for you…”) in the tool descriptions, but it’s unreliable and doesn’t help for longer waits (e.g., 10+ seconds).
How Do You Handle This?
Has anyone found a better approach? Shouldn’t the model itself have a built-in way to handle this scenario?