OpenAI Responses API: Extremely Slow response.created Event When Large Tools Array is Provided

Hi everyone,

I’m experiencing a significant performance issue with the OpenAI Responses API that I wanted to share with the community to see if others have encountered this or if there’s a recommended solution.

Problem

When using the Responses API with a large number of tools (100+ tools), the API response becomes extremely slow - to the point where it appears to be hanging. The root cause is that the `response.created` event includes a `tools` property that echoes back the complete array of tools that were provided in the request.

Since the response is streamed byte-by-byte over SSE (Server-Sent Events), streaming this massive tools array can take a very long time, making the API appear unresponsive during the initial phase of the response.

Technical Details

- API: OpenAI Responses API

- Event: response.created

- Behavior: The event includes a tools property containing the full list of tools from the request

- Impact: With 100+ tools, streaming the response.created event can take so long that the API appears to hang

Why This is Unexpected

This behavior is unique to the Responses API. The Chat Completion API does not echo back the tools array in its response, which is the expected behavior. The tools were already sent in the request, so there’s no clear reason why they need to be streamed back to the client in the response.

Reproduction

1. Create a Responses API request with 100+ tools

2. Observe that the initial response.created event takes an extremely long time to complete streaming

3. The API appears to hang during this phase

Workaround

The only workaround I’ve found is to dramatically reduce the number of tools provided in each request. However, this isn’t always practical depending on the application’s requirements.

Questions

1. Is this intentional behavior? Why does the Responses API echo back the tools array when the Chat Completion API doesn’t?

2. Is there a way to disable this behavior? Can the API skip including tools in the response?

3. Are there any optimizations planned? This seems like an architectural issue that affects anyone using the Responses API with many tools.

Has anyone else encountered this issue? Any suggestions or workarounds would be greatly appreciated!

Thanks!

1 Like

It gets more ridiculous that that.

Let’s say that I want to avoid network traffic by putting all my knowledge into a “prompt”… additional messages of permanent “RAG” (messages that you cannot set to a role other than “user” or “assistant” in the dumb platform site, and especially no ‘knowledge’ or ‘retrieval’ role):

What “knowledge” could be so big? How about 3MB of purely the JSON schema of the event stream of Responses itself. You want an AI to be able to answer about “ResponseMCPListToolsInProgressEvent”, right, or 375kB of just “ResponseFailedEvent” schema?

Then the impossible prompts UI, where you can’t even grab the scrollbar without exiting the “edit mode” of a message, so editing contents there is nigh impossible.

But I get it in there, skirting the maximum of 1MB-per-message arbitrary limit on the Responses API, a message limited to 1/10th the context limit of million token models to ingest YAML. To make my application that uses 40% of the input context of the model… Or so I think:

image

Just stuck forever after all the chunking and placement. No error in the UI about the 500 error, just back to ‘name the prompt’ again.

Utter failure as an endpoint and a tool.


The point is: I’d be being hit with 5MB of data echoed back to me in every response, just as if I was sending “instructions” in the API call and getting it sent right back to me, except usage of prompt returns an array of messages as “instructions” . The same as you seem to get with tools - big data bandwidth for the AI to say “hello”, even the AI output repeated multiple times in the stream.

How about an “echo off” option??

Another delay that you may be experiencing is that if you use strict structured functions, an artifact has to be built to run that prompt. This is cached, but only for a short time, and making the request again can have a 30+ second delay on that many tools. That’s another issue you might be hitting if you aren’t metering specifically the transmission time of the event stream ‘done’, which would be compressed with Brotli if you have a full complement of Accept-Encoding in your API request headers.