Gpt-realtime-1.5: text output mode broken when tools are enabled

I’ve been using gpt-realtime-1.5 for a couple of days now and ran into an interesting issue. When using output_modalities=[“audio”] , the model works great. But when I switch to
output_modalities=[“text”] with tools enabled and rely on an external TTS, the performance drops significantly compared to gpt-realtime.

Issues I’m seeing in text-only mode:

  • Model wraps normal conversational responses in curly braces {} as if it’s outputting JSON
  • Function call arguments leak into the text output channel (the TTS literally tries to speak the function call JSON)
  • Internal control tokens leak into the output, e.g.: <|aesthetics_3|><|has_watermark|>
  • Ignores language instructions that gpt-realtime followed perfectly

None of these issues exist with gpt-realtime in the same configuration, or with gpt-realtime-1.5 in audio output mode. Seems specific to text mode + tools.

3 Likes

I would like to second that there is something very very wrong with output_modalities=[”text”] on the new model. Almost every response it gives is somehow wrong, or is a tool call at the incorrect time. After an incorrect tool call or response, it follows up with an “oops, I messed that up, let’s try again” and tries to continue.

1 Like

Yes this is happening to me too , and for some reason the first turn/message is always stuck till a second message comes

Hi and welcome to the community!

I can also reproduce several of the behaviors you described:

  • In text-only mode, the model does return JSON-like content (for example, normal replies wrapped in { ... }) instead of a natural conversational answer.
  • I also see tool-related JSON leaking into the user-facing text output in this setup, which would cause an external TTS that reads the text stream to literally speak JSON.
  • In the same configuration, I see weaker adherence to instructions compared to audio output mode.

Will ping the team to take a look!

Ps. I did not capture the “internal control tokens” leak (<|aesthetics_3|><|has_watermark|>) in my tests. If anyone can share request IDs that will be helpful.

1 Like

Thanks for reproducing and escalating!

Unfortunately I didn’t capture the specific request IDs for the control token leak at the time I’ll start logging them and share as soon as I can reproduce it again.