We have been noticing, with the Realtime API (gpt5-realtime), that preambles defined with tool calls will inconsistently present themselves alongside the tool call. Invoking the same tool repeatedly yields poor and unreliable results when it comes to the agent speaking preambles—probably 1/3 of the time, the agent will not speak a preamble.
Has anyone experienced this? Is this expected behavior? Is there a remedy to this problem?
“Preamble” is just a name recently assigned for something that function-calling models could always do. It works even on models where you can use a temperature parameter to avoid random token results that misbehave, as gpt-5 is apt to do - a gamble there on whether you get a tool call special token or a user output token as the first thing AI produces after thousands of reasoning tokens.
The best way to either encourage or prohibit text generated to a user before a function is used is in the function description itself.
I suggest you try the same style of instruction in the function itself: give the encouragement or prohibition for writing to the user (in the final channel) on a per-tool basis, and this should be more closely followed.
GPT-5 is on a noticeable downward slide in instruction-following though. I could make screenshots with “developer” in one panel with exactly what the AI should do for behavior and action, and the complete disobedience in the 0-shot response to a user’s job.