Responses API: Multiple Tool use Duplicates output_text

When I manage to get multiple tools to be used for a single Reponses API call I get multiple ‘end user’ messages (‘output_text’) and no ‘integrated’ response. So I’ll get a response based on web search and then a completely different response based on file search. Surely this isn’t expected behaviour? It’s happening in the Playground with both responses showing one after the other and also via the Python Library as below.

{
  "id": "resp_67e9f....",
  "created_at": 1743385305.0,
  "error": null,
  "incomplete_details": null,
  "instructions": null,
  "metadata": {},
  "model": "gpt-4o-2024-08-06",
  "object": "response",
  "output": [
    {
      "id": "ws_67e9f2d...",
      "status": "completed",
      "type": "web_search_call"
    },
    {
      "id": "msg_67e9f2d...",
      "content": [
        {
          "annotations": [
            {
              "end_index": 684,
              "start_index": 527,
              "title": "Title 1",
              "type": "url_citation",
              "url": "https://url"
            }
          ],
          "text": "HERE IS An END USER RESPONSE BASED ON THE WEB SEARCH ",
          "type": "output_text"
        }
      ],
      "role": "assistant",
      "status": "completed",
      "type": "message"
    },
    {
      "id": "fs_67e9f2df9...",
      "queries": [
        "Query 1",
        "Query 2"
      ],
      "status": "completed",
      "type": "file_search_call",
      "results": null
    },
    {
      "id": "msg_67e9f2e5a...",
      "content": [
        {
          "annotations": [
            {
              "file_id": "file-Hi...",
              "index": 389,
              "type": "file_citation",
              "filename": "My File .txt"
            }
          ],
          "text": "HERE IS ANOTHER RESPONSE BASED ON THE FILE SEARCH ONLY.",
          "type": "output_text"
        }
      ],
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": false,
  "temperature": 1.0,
  "tool_choice": {
    "type": "web_search_preview"
  },

I believe the issue here arises because OpenAI injects their own message regarding how to reproduce from news. The AI sees this (and not you). It inspires the AI to write:

One has to co-opt within the internal iterations for such revelation.

Then if clearly provided more instructions, such as your deliberate “search web, then search files”, there is an iterator in Responses, and more internal unobservable calls can occur to more tools, resulting in more output. The list of outputs is clearly designed to handle such scenarios, where the AI could even be instructed to, without intervention, call and report on multiple queries automatically.

Your own function-call loop that you can receive both “content” and “tool_call”, show and act on them, and then upon return the AI can issue more tool calls and is not barred from producing user-seen content also? Responses has that built in.

If you don’t want AI running out of your control, by language you didn’t produce, you will have to use your own API service for web search and RAG.

1 Like

I’ve experimented more and it does seem like the second response can take some of the contents of the first response into consideration. But it is a bit weird getting multiple user-targetted messages from a single call (and also having the Playground UI output them both).

The problem is that much of the detail in any earlier user-targeted message is lost when it’s output as a user level message. So you have 'Web Search → model sees detailed results → User output 1 → ??? → File Search → Model see User output 1 + File search results → Final Output. At no time were the web search and file search contents all available at once to the model. It’s not really an output that fully integrates the the tools.

After looking into this more I think this must be a bug. If you print out the response.output_text you just get the output from both tools concatenated which doesn’t even make sense for the user. So for example the response.output_text is literally like

The latest news in kite surfing is ..... (from search)
The latest news in kite surfing is .... (from file search)

How can this possibly not be a bug? Which output result are you even supposed to use?

Also both responses must be being charged as token output right? So let’s say you have a question like

Please analyse War and Peace in detail and produce a 4000 word report

It seems you would get two (or more) full responses and be paying for both outputs at the output rates? I’d love to find out I’m doing it wrong and this isn’t the way it’s designed to work.

Switch and prioritize the order that tools are instructed to be used. File search return should act like a normal tool, allowing the AI freedom of inference. Then you’ll have to prioritize synthesis of file search into the results after the web search, because of direct instructions to dump out web search results like a rich man’s Google.

Responses is not ideal, because the tool descriptions are out of your control. Same as Assistants from day 0, though.

You pay for multiple internal iterations of tools on the same input context, which is as expected and much larger typical cost than the AI output of some search.

1 Like