Handling Overlapping Responses in Realtime API When Tools Take Too Long

Hi everyone!

I’m facing a specific issue while using the realtime API. When the model calls a function (tool) that takes several seconds to generate a result, the user might speak again in the meantime. This can lead to overlapping responses if the tool’s result arrives at the same time as the model’s new response.

Here’s a breakdown of the issue:

  1. t1 → The user asks a question.
  2. t2 → The model calls a tool.
  3. t3 → The tool starts executing.
  4. t4 → The user speaks again.
  5. t5 → The model generates a response.
  6. t6 → The tool’s result arrives, and the model tries to generate another response (which fails because it was already responding).

My Current Solution

I decided to ignore user input while a tool is executing, only processing it once the tool completes.

:white_check_mark: Pros:

  • Prevents overlapping responses.
  • Avoids mid-conversation interruptions due to delayed tool results.

:cross_mark: Cons:

  • This is not how a real conversation works. In a phone call, a user can ask something else while waiting.
  • It creates awkward silences and long wait times.
  • The user has no idea that my application isn’t listening while the tool is running.

A Partial Improvement

If a tool takes more than X seconds, I tell the model to generate a response like:
“I’m still retrieving the information, please wait a moment.”

Then, I re-trigger the same tool call with identical parameters. This way, when my app receives the re-triggered request, it knows a tool was already running, waits another X seconds (configurable timeout), and only generates the final response once the tool completes.

Key Considerations

  • I understand that tools should ideally be micro-tasks that execute quickly, but in reality, some computations or external API calls take time.
  • I tried adding a pre-call message (e.g., “Let me check that for you…”) in the tool descriptions, but it’s unreliable and doesn’t help for longer waits (e.g., 10+ seconds).

How Do You Handle This?

Has anyone found a better approach? Shouldn’t the model itself have a built-in way to handle this scenario?

1 Like