I’m building a chat based agent using gpt-5 and the openai agents sdk. We stream responses to the user. The agent commonly uses a few tools and sometimes does a few batches of tool calls.
I’m trying to utilize gpt-5’s pre-disposition to provide tool preambles - user facing explanations of what it will do before it calls any tools.
The problem I’m running into is that while we are streaming these, we would like to be able to identify them as a preamble, as opposed to the final response, so that we can display it differently on the frontend. However it seems that there is no way to reliably differentiate between an preamble or final response during streaming. Both in the agents sdk and the base api sdk itself.
The closest thing I found was that the agents sdk has a param result.final_outputbut this is not populated until the stream is done.
I’ve also explored a number of alternatives but found none of them reliable:
Streaming the preamble inside specific xml tags to identify (quite error prone).
A number of variations of biasing towards streaming one way then updating upon further info. For example: stream as final response then change type later if tool are called (leads to glitchy frontend experience because of change).
I believe this is possible and I’m missing something as a number of applications I use stream preambles to tools and do so in a different format than final responses. However it might be the case that those applications are actually streaming the reasoning summary (which can be easily distinguished as a different response type) as opposed to this newer concept of a preamble in gpt-5.
If anyone has any ideas here I’d greatly appreciate them.
Update:
So really there’s no “problem” to be fixed here. But GPT-5’s eagerness to provide the user with updates about what it is going to do before it does it has interested UX impacts
Most users of agentic applications have become accustomed to seeing reasoning tokens stream and displayed differently than the core model responses. My intuition was that the preambles would be a great, more human friendly, replacement for displaying those reasoning tokens. However, there is no reliable way to ID that a preamble is a preamble during streaming because even the model doesn’t yet know if it will call a tool.
So its an engineering / design decision if you want to display these at all. I was initially wrong and you can prompt the model to never give preambles. But if you want them, you’ll need to stream them as standard assistant messages to your frontend. Optionally you can also display the reasoning summaries (altho these only come through some of the time, many reasoning items have empty summaries).
The AI hasn’t emitted a tool call when it is writing to the user still.
It is natural language “content” of a turn that doesn’t need to be held out or used differently.
The only part of “decided” that is definite is when the AI writes, “Let me check my weather tool for Walla Walla for you now. (special tool sequence)(tool)”.
I understand all of the above. But my point is that if you as a developer desired to display this pre tool call statement of intent differently, there is no way to do so.
This is a weird UX. In every agentic application I use, the preamble or reasoning is displayed differently than final response text.
A workout would be to just disable this preamble - but it seems to be so heavily trained into the model that it is impossible to reliably prompt the behavior away.
I would happily just get rid of the preambles and display the reasoning summary instead. But per this thread I’ve found the reasoning summary before tool calls is often empty.
It is typical for the AI to simply call tools immediately, in this case, without having any idea what it would be picking from randomly, so the reasoning doesn’t quite reveal if that actually got done.
The trigger that could be immediately parsed (or held back for inspection of the trigger phrase if the person is prompting the AI to write weird Japanese themselves) is instructed by developer message.
If I query gpt-5 with access to tools there are two thing it might do before calling a tool.
The internal reasoning which is what I believe you’ve displayed above. This is identifiable as a reasoning item in the api response. You can get the text by including this in the call reasoning={"summary": "auto"}. Note: the reasoning text actually coming back inside of the reasoning item object is flaky.
Tool “preambles” as introduced in gpt-5. “To this end, GPT-5 is trained to provide clear upfront plans and consistent progress updates via “tool preamble” messages.”
My point is that there is no way to differentiate a preamble from a normal assistant message. And furthermore it does not seem that you can get the model to just not say them.
As you see in the chatgpt ui, the reasoning is shown differently than the final message (inside the though for x seconds dropdown). For the preambles there is no way to do this. Which means that you have to display them as a normal final response message. Which is a weird UX.
A “preamble” isn’t a real thing nor introduced with GPT-5. It has always been possible for the AI to write language “content” and also then call a tool (the order is never flipped, because emitting a tool means awaiting the response in a follow-up API call).
The AI model generates tokens one-at-a-time. Streaming facilitates you receiving the “live” generation. The backend parser captures when a tool output sequence begins, and then you start getting tool deltas instead of content deltas.
Reasoning models only add some internal thinking before this output begins.
There is no signal generated by the AI that is “this language that is meant for the user to read is special”. That would take predicting the future, that happens by a random sampling of token probabilities.
You could rewrite your UI with client code to have some kind of “compressed text” appearance on the content parts that have already been received and displayed if the output transitioned to a tool call you are handling.
“A “preamble” isn’t a real thing nor introduced with GPT-5.” Regarding this I know what you mean but its not quite the full picture. Prior models were trained to heavily prefer jumping right into tool calling. So a preamble was possible but very unlikely. GPT-5 has been trained to provide these more readily.
I completely understand the mechanics of why the model (therefore api) can’t itself know that streaming text is a preamble or final response while streaming.
Given this
(1) The model should be easily steered away from providing these. I’ve had some success with this but seen others note its over-eagerness to provide these.
(2) Developers need to reliably be able to access the reasoning summary. For me this is never coming through right now on gpt-5.
It’s a clunker of a model anyway. Look at this, can’t even tell that “developer” is not the user themselves, making it garbage for closed-domain API product development:
“Per my instructions” indeed. Talking about the developer role as if it were from a user. Because a developer is no more than a user adversary now, and OpenAI writes the “you are ChatGPT” system message for you to consume (it literally says “you are ChatGPT”). Roles are not even unconvoluted when it is time to refuse developer instructions and tell the user.
And good luck getting tools in parallel, despite verifying the wrapper placement.
So really there’s no “problem” to be fixed here. But GPT-5’s eagerness to provide the user with updates about what it is going to do before it does it has interested UX impacts
Most users of agentic applications have become accustomed to seeing reasoning tokens stream and displayed differently than the core model responses. My intuition was that the preambles would be a great, more human friendly, replacement for displaying those reasoning tokens. However, there is no reliable way to ID that a preamble is a preamble during streaming because even the model doesn’t yet know if it will call a tool.
So its an engineering / design decision if you want to display these at all. I was initially wrong and you can prompt the model to never give preambles. But if you want them, you’ll need to stream them as standard assistant messages to your frontend. Optionally you can also display the reasoning summaries (altho these only come through some of the time, many reasoning items have empty summaries).