Function-calling (responses mode) – model says “Here is the document” but never includes it

Hi,

Workflow (responses mode, auto function-calling):

  1. User → “Generate artifact”.
  2. Model → function call generate_artifact.
  3. My backend runs it, returns a ~5 k-token markdown artifact.
  4. I append:
{
  "type": "function_call_output",
  "call_id": "call_123",
  "output": "<full markdown artifact>"
}
  1. Second completion with full history: the model need to respond to the user with the generated artifact.

Problem

Instead of sending the artifact, the assistant replies with something like:

“Here’s the artifact, just fill the blanks.”

No artifact body is ever included.

What I’ve checked
• tool_call_id matches.
• Token limits OK (far below 32k).

Questions

  • Bug or intended behaviour?
  • Extra metadata needed to force the model to surface the tool content?
  • Anyone has succeeded in getting the full tool result echoed back in the assistant message?

Using o3 via python SDK

Thanks!

2 Likes

This is already a very questionable setup. Not only are you being charged 5k tokens for your input, but when the model copies your entire document you’ll also be charged for 5k output tokens in addition. And your latency is going to be pretty high.

I would suggest making a separate function to display the text to the user, and then explicitly instruct the model to call that second function if the returned text is desirable.

1 Like

Maybe it’s not a good setup, but the behaviour of the model is very strange. It thinks that the tool_output is visible to the user and I don’t know if it’s an intenden behaviour or not.

Furthermore the agent is a React Agent with more iteration and more tools to call.

You may see better results in explicitly telling the model how to handle the returned outputs. If you want it to repeat the entire document passed to it then you may need to instruct it to do so.

Already did, with no luck

Yes, agree with @OnceAndTwice.

If your intention is to show the entire document to the user, why not do that in the UI deterministically without the LLM being involved? You risk changes and hallucinations (and as @OnceAndTwice has pointed out, significant cost!)

If you need the LLM to discuss the document with the user, then your overall approach makes more sense, but you still might share the original document directly to avoid modifications by the LLM.

3 Likes

Not that simple: the tool extract a template based on the user request and then the react agent needs to modify that template with the user information provided.

1 Like

o3 is a reasoning model, so internally there are going to be loads more steps. I’m not sure that’s going to be as easy.

Have you tried using Chat Completions with 4.1? (or even 4.5?)

Also look at doing the document modification work in a separate disconnected call from within the function and store result locally.

Look at https://platform.openai.com/docs/guides/predicted-outputs for modifying existing text.

1 Like

for a POC i don’t mind the waste of tokens. The problem is that the model (o3) thinks that the output of a tool call is visible to the user (conversation)..

What makes you think that?

In none of my conversations with my chatbot are my tool calls and answers shown to the user by default (there’s an admin mode but that’s by the by)

This is my message History:

[
  // USER QUERY
  {"role": "user", "content": "create me a document"},
  // FIRST REACT ITERATION
  {
    "instructions": "<React Agent Instructions>",
    "model": "o3-2025-04-16",
    "object": "response",
    "output": [
      {"summary": [], "type": "reasoning"},
      {
        "arguments": {
          "query": "create a document"
        },
        "call_id": "call_123",
        "name": "create_document_tool",
        "type": "function_call",
        "id": "fc_123"
      }
    ],
    "tools": ["list of available tools"],
    "reasoning": {
      "effort": "medium",
      "summary": "detailed"
    }
  },
  // TOOL CALL SUCCESSFUL
  {
    "type": "function_call_output",
    "call_id": "call_123",
    "output": "The content of the document created as string (~10k tokens)"
  },
  // SECOND REACT ITERATION
  {
    "instructions": "<React Agent Instructions>",
    "model": "o3-2025-04-16",
    "object": "response",
    "output": [
      {
        "text": "Here the document created"
      }
    ],
    "tools": ["list of available tools"],
    "reasoning": {
      "effort": "medium",
      "summary": "detailed"
    }
  }
]

As you can the on the second iteration, the o3 model says that the document has been created but there is no document in the last content text output

So, I don’t know if it’s an intended behaviour but I think it’s a big problem.

“Why do you need to do a second react iteration?”
In some cases the ReAct agent may call other tools or modify some information in order to be compliant with the user requests

I think there’s an issue with the exact wording.

The user is asking for a document to be created. It correctly calls a function to create the document and correctly reports that the document has been created.

What if the user says "create and share the contents of that document*? Or “share the contents of a new document with me”?

Query: Can you generate a document and share it with me?
Response: Hese is your document. You can copy or print it.

But the document (string) is still inside the function tool output

The API did not output that verbatim I assume.

I think there is a misunderstanding here.

ChatGPT does that for you as a end user application, handling all the generation and creation of links. The API doesn’t do that, it has to be built by yourself.

The API models don’t create any files by default. What does create them are tools like code interpreter, image or speech endpoints for example.

Code interpreter runs python to create files in an ephemeral container. It means you must download them before it expires in 20 minutes or you lose them permanently.

The responses API usually produces annotations of these created files, which you need to download and provide to the user your own way with your UI. It will not provide any hosting or link for direct download.

Also, there is a known bug where sometimes code interpreter also doesn’t produce annotations. In these situations, you need to manually investigate the container and download any files before the ephemeral container expires.

Easy? No, this is not for a total beginner. Doable? Sure, with a little effort.

The documentation is your friend:

Also, try experimenting with the playground. Add code interpreter in the tools, and try out what it can create. In a few cases, you will have to note down the container id and run some code by yourself to retrieve the files, if the annotations fail.

2 Likes

No I’m not convinced. This is not about files. (necessarily)

This is about returning the text in the answer from the function to the user.

If you send back text to the LLM as an answer and set up the prompts so the LLM is encouraged to send the text back verbatim I can’t see why that wouldn’t work.

However to make this more likely I’d use GPT 4.5 which is very good at taking and remembering direction and there are no “reasoning” steps to obfuscate the output. I would also give GPT 4.1 a go.

Exactly, it is not a file it’s a text response from a tool: even if it’s long the LLM should return the text as is (or at least a simil-version).

I use o3 because my agent use a ReAct pattern for reasoning and iterating.

So it’s fairly easy to prove this can work:

  • calls function :white_check_mark:
  • text returned :white_check_mark:
  • assistant prints text to user :white_check_mark: (albeit with emoji’s as requested in system prompt :slight_smile: )

I’ve tried this with a much longer story and it worked too.

However, it is not reliable, and even with temperature set to 0.11 it sometimes doesn’t call the function.

You might want to experiment with different prompts and function descriptions to see if you can make it more reliable.

I’ll give it a try, thanks!

1 Like