Reproducible server_error on /v1/responses with gpt-4.1-mini + web_search tool

Hi all — posting a reproducible bug on the Responses API in case anyone else is hitting it, and to get more eyes on it.

Summary

POST /v1/responses returns HTTP 500 server_error when all three of the following are present:

  1. model: "gpt-4.1-mini"

  2. tools: [{ "type": "web_search" }]

  3. A specific class of input prompt (see below)

Changing any single one of those variables makes the request succeed. The failure is deterministic across multiple sessions over the past several days.

Minimal repro

Fails (HTTP 500, ~7–8s TTFB):

curl https://api.openai.com/v1/responses \
  -H "Authorization: Bearer $OPENAI_KEY" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "gpt-4.1-mini",
    "input": "What are the best AI-powered collaboration workspaces for teams in 2026?",
    "max_output_tokens": 2048,
    "tools": [{ "type": "web_search" }]
  }'

Response:

{
  "error": {
    "message": "An error occurred while processing your request... Please include the request ID req_XXXX in your message.",
    "type": "server_error",
    "code": "server_error"
  }
}

Controls (all succeed)

Change from failing request Result
Remove tools (keep model + prompt) HTTP 200, ~16s
Keep tools, change prompt to "What is the capital of France?" HTTP 200, ~1.3s
Keep tools + prompt, change model to gpt-5-nano HTTP 200, ~59s

So: the bug requires gpt-4.1-mini × web_search × this particular input together. Each variable in isolation works fine.

Failing request IDs

Five from two separate sessions:

  • req_2511bd1f82f04ff88ec63e1385705044

  • req_071d4913a42241f7a00cadef346b75b5

  • req_8196a931b7944e0590056742ff58ad2f

  • req_2aec8b9613324d83923773068d93f8ae

  • req_66c902aa815849fb866fd6691c048505

Observation that may help triage

Both failing requests return in 7–8 seconds. The matched gpt-5-nano control with the same prompt and web_search enabled runs for ~59 seconds before completing successfully. That gap suggests gpt-4.1-mini is bailing out early during tool orchestration rather than timing out on a long-running search — feels backend-side, not search-side.

What I’ve ruled out

  • Not an auth or rate-limit issue (other requests on the same key/org succeed immediately before and after).

  • Not network — direct curl to api.openai.com, no proxies, no SDKs, full HTTP trace captured.

  • Not specific to one input — variations of “best X in 2026” style prompts with gpt-4.1-mini + web_search show the same behavior.

  • Reproducible across days, different sessions, fresh API keys.

Filed via support a week ago (case 08421197) but the response loop has been asking for HAR files, batch IDs, conversation URLs, and most recently questions about Codex — none of which apply to a direct curl call against the Responses API. Posting here in the hope of getting it in front of the API platform team.

Anyone else seeing this? And if any OpenAI staff want full curl traces (HAR-equivalent, sanitized) or the mp4 recording I already sent to support, happy to share — just ping me.

Thanks.

Have you tried it without using max_output_tokens ?

Hi and welcome to the community!

Thanks for raising this. I could not reproduce the issue on my side, though.
Could you share a bit more of your code? That would make it easier to look into.

I also could not reproduce an error, transmitting the same.

Your answer delivered (with URL links peppered up with OpenAI query strings to ensure your product is OpenAI’s product)

As of May 2026, several AI-powered collaboration workspaces have emerged to enhance team productivity and streamline workflows. Here are some of the top options:

Microsoft 365 Copilot: Your window into the world of agents | Microsoft 365 Blog
Microsoft 365 Copilot
Integrated across Microsoft Teams, Outlook, Word, Excel, and more, Copilot offers features like real-time meeting summaries, email drafting, and data analysis through natural language processing. The recent ‘Wave 3’ update introduces customizable AI agents for specific workflows. (windowscentral.com)

How to build a tech stack for a remote startup - Appwrite
Notion AI
Notion AI assists in generating meeting notes, drafting updates, and structuring documentation, making it ideal for teams balancing operational tasks with creative projects. (gend.co)…

The environment variable being used in the example is non-standard, not what an OpenAI SDK would default to. Ensure you are using the right key for the right project and its limits.

The max_output_tokens is about the minimum you should set gpt-4.1 series at - it can write longer if it is really encouraged. Reminder: that is your maximum bill, and should be set significantly higher with internal tools.

Some Python follows to repro the wire, parse for text output, capture and display any error message body as was shown above.

"""OpenAI Responses API using httpx, with web search tool."""

import os
import httpx

api_body = {
    "model": "gpt-4.1-mini",
    "input": "What are the best AI-powered collaboration workspaces for teams in 2026?",
    "max_output_tokens": 2048,
    "tools": [{"type": "web_search"}],
}

def extract_response_text(response_json: dict) -> str:
    """Extract assistant text from a Responses API JSON payload."""
    if isinstance(response_json.get("output_text"), str):
        return response_json["output_text"]

    return "".join(
        content["text"]
        for output in response_json.get("output", [])
        if output.get("type") == "message"
        for content in output.get("content", [])
        if content.get("type") == "output_text" and "text" in content
    )


def main() -> None:
    with httpx.Client(timeout=300) as client:
        response = client.post(
            "https://api.openai.com/v1/responses",
            headers={
                "Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}",
                "Content-Type": "application/json",
            },
            json=api_body,
        )

    if response.status_code not in {200, 201}:
        print(f"HTTP error: {response.status_code}")
        print(response.text)
        return

    response_json = response.json()
    assistant_text = extract_response_text(response_json)
    print(assistant_text)


if __name__ == "__main__":
    main()

You say, “deterministic across multiple sessions”, but we don’t see more of a session than an input string.

You might consider sending "instructions" in your API request: more system message text that tells the AI model its role for the user’s benefit. At least as a change of scenery, it would break up the “deterministic” pattern with a more typical trained context.

Thanks for testing it, @_j good to hear it’s working on your end. I just re-ran my exact failing curl and it now returns 200 as well, so something appears to have changed server-side between my last failing repro (~48 hours ago) and now. Five reproducible 500s across two sessions to zero in the same configuration is a pretty clean before/after.

If anyone from the platform team is reading: would be useful to confirm whether a fix or config change went out for web_search tool orchestration on gpt-4.1-mini in the last 48 hours, just so future readers of this thread know whether to expect a regression. Happy to share the original failing request IDs (also in support case 08421197) if it helps correlate to logs.

Leaving the thread open in case the regression returns.