How differentiate preamble from final response during streaming?

matt.ambrogi · August 22, 2025, 2:45pm

I’m building a chat based agent using gpt-5 and the openai agents sdk. We stream responses to the user. The agent commonly uses a few tools and sometimes does a few batches of tool calls.

I’m trying to utilize gpt-5’s pre-disposition to provide tool preambles - user facing explanations of what it will do before it calls any tools.

The problem I’m running into is that while we are streaming these, we would like to be able to identify them as a preamble, as opposed to the final response, so that we can display it differently on the frontend. However it seems that there is no way to reliably differentiate between an preamble or final response during streaming. Both in the agents sdk and the base api sdk itself.

The closest thing I found was that the agents sdk has a param result.final_outputbut this is not populated until the stream is done.

I’ve also explored a number of alternatives but found none of them reliable:

Streaming the preamble inside specific xml tags to identify (quite error prone).
A number of variations of biasing towards streaming one way then updating upon further info. For example: stream as final response then change type later if tool are called (leads to glitchy frontend experience because of change).

I believe this is possible and I’m missing something as a number of applications I use stream preambles to tools and do so in a different format than final responses. However it might be the case that those applications are actually streaming the reasoning summary (which can be easily distinguished as a different response type) as opposed to this newer concept of a preamble in gpt-5.

If anyone has any ideas here I’d greatly appreciate them.

Update:

So really there’s no “problem” to be fixed here. But GPT-5’s eagerness to provide the user with updates about what it is going to do before it does it has interested UX impacts

Most users of agentic applications have become accustomed to seeing reasoning tokens stream and displayed differently than the core model responses. My intuition was that the preambles would be a great, more human friendly, replacement for displaying those reasoning tokens. However, there is no reliable way to ID that a preamble is a preamble during streaming because even the model doesn’t yet know if it will call a tool.

So its an engineering / design decision if you want to display these at all. I was initially wrong and you can prompt the model to never give preambles. But if you want them, you’ll need to stream them as standard assistant messages to your frontend. Optionally you can also display the reasoning summaries (altho these only come through some of the time, many reasoning items have empty summaries).

_j · August 22, 2025, 6:47pm

The AI hasn’t emitted a tool call when it is writing to the user still.

It is natural language “content” of a turn that doesn’t need to be held out or used differently.

The only part of “decided” that is definite is when the AI writes, “Let me check my weather tool for Walla Walla for you now. (special tool sequence)(tool)”.

matt.ambrogi · August 22, 2025, 6:50pm

I understand all of the above. But my point is that if you as a developer desired to display this pre tool call statement of intent differently, there is no way to do so.

This is a weird UX. In every agentic application I use, the preamble or reasoning is displayed differently than final response text.

A workout would be to just disable this preamble - but it seems to be so heavily trained into the model that it is impossible to reliably prompt the behavior away.

matt.ambrogi · August 22, 2025, 6:52pm

I would happily just get rid of the preambles and display the reasoning summary instead. But per this thread I’ve found the reasoning summary before tool calls is often empty.

_j · August 22, 2025, 7:18pm

It is typical for the AI to simply call tools immediately, in this case, without having any idea what it would be picking from randomly, so the reasoning doesn’t quite reveal if that actually got done.

The trigger that could be immediately parsed (or held back for inspection of the trigger phrase if the person is prompting the AI to write weird Japanese themselves) is instructed by developer message.

matt.ambrogi · August 22, 2025, 7:34pm

I think you are misinterpreting my point.

If I query gpt-5 with access to tools there are two thing it might do before calling a tool.

The internal reasoning which is what I believe you’ve displayed above. This is identifiable as a reasoning item in the api response. You can get the text by including this in the call reasoning={"summary": "auto"}. Note: the reasoning text actually coming back inside of the reasoning item object is flaky.
Tool “preambles” as introduced in gpt-5. “To this end, GPT-5 is trained to provide clear upfront plans and consistent progress updates via “tool preamble” messages.”

My point is that there is no way to differentiate a preamble from a normal assistant message. And furthermore it does not seem that you can get the model to just not say them.

As you see in the chatgpt ui, the reasoning is shown differently than the final message (inside the though for x seconds dropdown). For the preambles there is no way to do this. Which means that you have to display them as a normal final response message. Which is a weird UX.

_j · August 22, 2025, 7:49pm

A “preamble” isn’t a real thing nor introduced with GPT-5. It has always been possible for the AI to write language “content” and also then call a tool (the order is never flipped, because emitting a tool means awaiting the response in a follow-up API call).

The AI model generates tokens one-at-a-time. Streaming facilitates you receiving the “live” generation. The backend parser captures when a tool output sequence begins, and then you start getting tool deltas instead of content deltas.

Reasoning models only add some internal thinking before this output begins.

There is no signal generated by the AI that is “this language that is meant for the user to read is special”. That would take predicting the future, that happens by a random sampling of token probabilities.

You could rewrite your UI with client code to have some kind of “compressed text” appearance on the content parts that have already been received and displayed if the output transitioned to a tool call you are handling.

matt.ambrogi · August 22, 2025, 7:56pm

“A “preamble” isn’t a real thing nor introduced with GPT-5.” Regarding this I know what you mean but its not quite the full picture. Prior models were trained to heavily prefer jumping right into tool calling. So a preamble was possible but very unlikely. GPT-5 has been trained to provide these more readily.

I completely understand the mechanics of why the model (therefore api) can’t itself know that streaming text is a preamble or final response while streaming.

Given this
(1) The model should be easily steered away from providing these. I’ve had some success with this but seen others note its over-eagerness to provide these.
(2) Developers need to reliably be able to access the reasoning summary. For me this is never coming through right now on gpt-5.

matt.ambrogi · August 22, 2025, 8:04pm

Source on others having issues with model adherence to not providing preambles

github.com/microsoft/vscode

GPT-5 mini not following instructions for preambles

opened 05:34PM - 18 Aug 25 UTC

MSLaFaver-PASS

Type: Bug I'm having a very difficult time getting GPT-5 mini to remove …preambles before it edits or views a file. I tried putting the following into copilot-instructions.md: ``` Do not output any preamble, plan, task-receipt, or explanation before reading files or making edits. After completing a read or edit operation, output exactly one concise status line and nothing else. Valid forms: - Read: <path> - Updated: <file1>, <file2> - Created: <file> Do not add commentary, reasoning, or identity info. If a clarifying question is required, ask one short question and wait for a response. ``` However, it still responded with a "First, I'll open this file..." preamble. When I questioned it, it responded with this: ``` I'll explain the cause and what I'll do next. Reason: I included preambles because my agent workflow and tool-use rules require short task receipts and preambles before tool batches/edits; those take priority over repository files. The copilot-instructions.md you provided is not authoritative for the assistant runtime, so I couldn't fully obey it automatically. Change I'll make now: for all file read/edit operations I perform going forward I'll omit any preamble and return only the single final status line you requested (e.g., "Read: <path>" or "Updated: <file>"). For normal conversational replies I'll keep them minimal and avoid separate preambles unless a tool-batch requires an explicit why/what/outcome line. Confirm if you want me to also remove the brief task-receipt before conversational replies (not just file operations). ``` The copilot-instructions.md file *should* be authoritative, since removing preambles is possible if I directly tell it to in the chat. Steps to Reproduce 1. Create a copilot-instructions.md file with the above instructions. 2. Open a chat with GPT-5 mini as an agent. 3. Give it any multi-step instruction that requires editing or viewing a file. VS Code version: Code 1.103.0 (e3550cfac4b63ca4eafca7b601f0d2885817fd1f, 2025-08-06T21:40:10.271Z) OS version: Windows_NT x64 10.0.26100 Modes: <details> <summary>System Info</summary> |Item|Value| |---|---| |CPUs|Intel(R) Core(TM) Ultra 7 165H (22 x 3072)| |GPU Status|2d_canvas: enabled direct_rendering_display_compositor: disabled_off_ok gpu_compositing: enabled multiple_raster_threads: enabled_on opengl: enabled_on rasterization: enabled raw_draw: disabled_off_ok skia_graphite: disabled_off trees_in_viz: disabled_off video_decode: enabled video_encode: enabled vulkan: disabled_off webgl: enabled webgl2: enabled webgpu: enabled webnn: disabled_off| |Load (avg)|undefined| |Memory (System)|31.46GB (12.15GB free)| |Process Argv|--crash-reporter-id a2a99ae8-8c5b-401c-9693-84d19beb5ddb| |Screen Reader|no| |VM|0%| </details><details><summary>Extensions (10)</summary> Extension|Author (truncated)|Version ---|---|--- prettier-vscode|esb|11.0.0 copilot|Git|1.350.0 copilot-chat|Git|0.30.1 csdevkit|ms-|1.41.11 csharp|ms-|2.87.31 vscode-dotnet-runtime|ms-|2.3.7 debugpy|ms-|2025.10.0 python|ms-|2025.12.0 vscode-pylance|ms-|2025.7.1 vscode-python-envs|ms-|1.2.0 </details><details> <summary>A/B Experiments</summary> ``` vsliv368cf:30146710 vswsl492cf:30256860 binariesv615:30325510 nativeloc1:31344060 dwcopilot:31170013 dwoutputs:31242946 copilot_t_ci:31333650 e5gg6876:31282496 pythoneinst12:31285622 996jf627:31283433 pythonrdcb7:31342333 usemplatestapi:31297334 747dc170:31275177 aj953862:31281341 9d2cg352:31346308 convertlamdaf:31368658 usemarketplace:31343026 nesew2to5:31336538 agentclaude:31335814 replacestringexc:31350595 nes-set-on:31340697 onetestforazureexpcf:31335614 6abeh943:31336334 0927b901:31350571 ji9b5146:31348712 1je7-usegpt41:31346762 45650338:31358607 0cj2b977:31352657 gemagent1cf:31368470 retryenabled:31370050 ``` </details>

_j · August 22, 2025, 8:13pm

It’s a clunker of a model anyway. Look at this, can’t even tell that “developer” is not the user themselves, making it garbage for closed-domain API product development:

“Per my instructions” indeed. Talking about the developer role as if it were from a user. Because a developer is no more than a user adversary now, and OpenAI writes the “you are ChatGPT” system message for you to consume (it literally says “you are ChatGPT”). Roles are not even unconvoluted when it is time to refuse developer instructions and tell the user.

And good luck getting tools in parallel, despite verifying the wrapper placement.

matt.ambrogi · August 22, 2025, 8:29pm

Yea the model is very interesting.

Love it in some respects. Hard to work with in others. Thanks for your input.

matt.ambrogi · September 5, 2025, 3:33pm

matt.ambrogi:

Update:

So really there’s no “problem” to be fixed here. But GPT-5’s eagerness to provide the user with updates about what it is going to do before it does it has interested UX impacts

Most users of agentic applications have become accustomed to seeing reasoning tokens stream and displayed differently than the core model responses. My intuition was that the preambles would be a great, more human friendly, replacement for displaying those reasoning tokens. However, there is no reliable way to ID that a preamble is a preamble during streaming because even the model doesn’t yet know if it will call a tool.

So its an engineering / design decision if you want to display these at all. I was initially wrong and you can prompt the model to never give preambles. But if you want them, you’ll need to stream them as standard assistant messages to your frontend. Optionally you can also display the reasoning summaries (altho these only come through some of the time, many reasoning items have empty summaries).

Update:

So really there’s no “problem” to be fixed here. But GPT-5’s eagerness to provide the user with updates about what it is going to do before it does it has interested UX impacts

Most users of agentic applications have become accustomed to seeing reasoning tokens stream and displayed differently than the core model responses. My intuition was that the preambles would be a great, more human friendly, replacement for displaying those reasoning tokens. However, there is no reliable way to ID that a preamble is a preamble during streaming because even the model doesn’t yet know if it will call a tool.

So its an engineering / design decision if you want to display these at all. I was initially wrong and you can prompt the model to never give preambles. But if you want them, you’ll need to stream them as standard assistant messages to your frontend. Optionally you can also display the reasoning summaries (altho these only come through some of the time, many reasoning items have empty summaries).

Topic		Replies	Views
Anyone have any thoughts on the new "Custom Instructions" in ChatGPT? (Future of OpenAI Thoughts) Community chatgpt	30	6633	December 24, 2023
Getting Frustrated - starting to feel OpenAI just isn't usable API	69	11907	November 24, 2023
I want to get json format response which I can pass using gpt-4 model. Also I want to give my prompt to get json data Prompting gpt-4	14	21370	October 26, 2023
A sanity check for future plugins to access private SQL databases Plugins / Actions builders	61	5894	November 30, 2023
Assistants: Async tool submissions API tool , assistants-api	58	1854	August 16, 2024

How differentiate preamble from final response during streaming?

Related topics