This is already a very questionable setup. Not only are you being charged 5k tokens for your input, but when the model copies your entire document you’ll also be charged for 5k output tokens in addition. And your latency is going to be pretty high.
I would suggest making a separate function to display the text to the user, and then explicitly instruct the model to call that second function if the returned text is desirable.
Maybe it’s not a good setup, but the behaviour of the model is very strange. It thinks that the tool_output is visible to the user and I don’t know if it’s an intenden behaviour or not.
Furthermore the agent is a React Agent with more iteration and more tools to call.
You may see better results in explicitly telling the model how to handle the returned outputs. If you want it to repeat the entire document passed to it then you may need to instruct it to do so.
If your intention is to show the entire document to the user, why not do that in the UI deterministically without the LLM being involved? You risk changes and hallucinations (and as @OnceAndTwice has pointed out, significant cost!)
If you need the LLM to discuss the document with the user, then your overall approach makes more sense, but you still might share the original document directly to avoid modifications by the LLM.
Not that simple: the tool extract a template based on the user request and then the react agent needs to modify that template with the user information provided.
for a POC i don’t mind the waste of tokens. The problem is that the model (o3) thinks that the output of a tool call is visible to the user (conversation)..
[
// USER QUERY
{"role": "user", "content": "create me a document"},
// FIRST REACT ITERATION
{
"instructions": "<React Agent Instructions>",
"model": "o3-2025-04-16",
"object": "response",
"output": [
{"summary": [], "type": "reasoning"},
{
"arguments": {
"query": "create a document"
},
"call_id": "call_123",
"name": "create_document_tool",
"type": "function_call",
"id": "fc_123"
}
],
"tools": ["list of available tools"],
"reasoning": {
"effort": "medium",
"summary": "detailed"
}
},
// TOOL CALL SUCCESSFUL
{
"type": "function_call_output",
"call_id": "call_123",
"output": "The content of the document created as string (~10k tokens)"
},
// SECOND REACT ITERATION
{
"instructions": "<React Agent Instructions>",
"model": "o3-2025-04-16",
"object": "response",
"output": [
{
"text": "Here the document created"
}
],
"tools": ["list of available tools"],
"reasoning": {
"effort": "medium",
"summary": "detailed"
}
}
]
As you can the on the second iteration, the o3 model says that the document has been created but there is no document in the last content text output
So, I don’t know if it’s an intended behaviour but I think it’s a big problem.
“Why do you need to do a second react iteration?” In some cases the ReAct agent may call other tools or modify some information in order to be compliant with the user requests
The user is asking for a document to be created. It correctly calls a function to create the document and correctly reports that the document has been created.
What if the user says "create and share the contents of that document*? Or “share the contents of a new document with me”?
ChatGPT does that for you as a end user application, handling all the generation and creation of links. The API doesn’t do that, it has to be built by yourself.
The API models don’t create any files by default. What does create them are tools like code interpreter, image or speech endpoints for example.
Code interpreter runs python to create files in an ephemeral container. It means you must download them before it expires in 20 minutes or you lose them permanently.
The responses API usually produces annotations of these created files, which you need to download and provide to the user your own way with your UI. It will not provide any hosting or link for direct download.
Also, there is a known bug where sometimes code interpreter also doesn’t produce annotations. In these situations, you need to manually investigate the container and download any files before the ephemeral container expires.
Easy? No, this is not for a total beginner. Doable? Sure, with a little effort.
Also, try experimenting with the playground. Add code interpreter in the tools, and try out what it can create. In a few cases, you will have to note down the container id and run some code by yourself to retrieve the files, if the annotations fail.
No I’m not convinced. This is not about files. (necessarily)
This is about returning the text in the answer from the function to the user.
If you send back text to the LLM as an answer and set up the prompts so the LLM is encouraged to send the text back verbatim I can’t see why that wouldn’t work.
However to make this more likely I’d use GPT 4.5 which is very good at taking and remembering direction and there are no “reasoning” steps to obfuscate the output. I would also give GPT 4.1 a go.