Web Search Completion Cuts Off Response and ignores structured outputs on complex prompts

qhenkart · April 13, 2025, 9:54am

This reminds me of the days before the JSON schema option and JSON Structured outputs.

Currently I have a few complex structured output schemas. These have an extremely high failure rate only with the Web Search Preview model.

The model commonly cuts the content in the middle of the output, breaking the JSON entirely. There is no error or “lack of tokens” etc. It fails silently and breaks with the schema output by ending abruptly mid string. Using the normal GPT4o model does not have this problem.

I have resolved this with some of my requests by vastly simplifying my schema which brought it to a high success right. However not all of my schema outputs can be simplified in such a manner. I am looking forward to a fix or any suggestions if anyone found a way to mitigate this issue

_j · April 13, 2025, 10:32am

First thing I’d check: see the finish_reason that you are getting in the API response. “stop”…or “length”.

This model is particular in that it seems to operate by RAG auto-injection every turn, yet you are not billed by input tokens for that portion (obfuscation of the delivery…). max_tokens isn’t available and shouldn’t be affecting this unusually, which is the first thought that could produce termination.

Then: if you cut off structured outputs in a non-stream response - you’re gonna get an API error if there is enforcement of strict being applied. I would review the “container” between response_format and “schema” to see that you have correct placement of the strict parameter.

Python:

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[],
  response_format={
    "type": "json_schema",
    "json_schema": {
      "name": "lotto_guesses",
      "strict": True,
      "schema": {
        "type": "object",
        "properties": {
          "my_guess": {
            "type": "string",
            "description": "The user's lottery number guess."
          }
        },
        "required": [
          "my_guess"
        ],
        "additionalProperties": False
      }
    }
  },
  ...

You provide no information about your implementation though. You can also build in Python a BaseModel and send that as response_format to the .parse() method, ensuring strict and validation.

You pay every time for search, even for hello, by using this model. In Responses, however, you have a tool, meaning a second iteration, but an AI choice of whether to invoke a search or not (with pattern-damaging injections after).

qhenkart · April 13, 2025, 12:21pm

@_j Thank you for the informed response.

The finish reason is indeed “Stop” and not “length” which was my first suspicion when I noticed this behavior. Furthermore I believe strict is correctly applied.

I think this is an openAI bug versus an incorrect usage of their API

But just to be sure, here is a heavily redacted version of my payload

{
  "messages": [
    {
      "role": "system",
      "content": "You are an AI assistant for a software platform. You are assigned to a structured project and must assist the user based on predefined context. Responses must contain plain newlines only—no markdown, no formatting, and no special characters. Respond only in English. Populate 5 items related to the following category: placeholderCategory."
    },
    {
      "role": "user",
      "content": "I want to create the following product: An AI-powered solution designed to enhance a traditional manual process. It adapts to user preferences in real time to provide personalized suggestions. The platform serves both end-users and enterprise clients to improve efficiency and user experience."
    }
  ],
  "model": "gpt-4o-search-preview",
  "response_format": {
    "json_schema": {
      "name": "placeholderSchema",
      "strict": true,
      "schema": {
        "type": "object",
        "required": ["data"],
        "properties": {
          "data": {
            "type": "object",
            "required": ["placeholderCategory"],
            "properties": {
              "placeholderCategory": {
                "description": "Must contain exactly 5 elements",
                "type": "array",
                "items": {
                  "type": "object",
                  "required": ["title", "description", "relatedOrgs"],
                  "properties": {
                    "title": {
                      "type": "string",
                      "description": "Short label for the item"
                    },
                    "description": {
                      "type": "string",
                      "description": "Detailed explanation of the item and suggested next steps"
                    },
                    "relatedOrgs": {
                      "type": "array",
                      "items": {
                        "type": "object",
                        "required": ["name", "url"],
                        "properties": {
                          "name": {
                            "type": "string",
                            "description": "Name of a real, currently active organization"
                          },
                          "url": {
                            "type": "string",
                            "description": "URL of the organization's website"
                          }
                        },
                        "additionalProperties": false
                      }
                    }
                  },
                  "additionalProperties": false
                }
              }
            },
            "additionalProperties": false
          }
        },
        "additionalProperties": false
      }
    }
  },
  "web_search_options": {
    "search_context_size": "high"
  }
}

adding the redacted response

{
  "redacted": "redacted",
  "redacted": "redacted",
  "redacted": "redacted",
  "redacted": [
    {
      "redacted": [
        {
          "redacted": "redacted",
          "redacted": "redacted"
        },
        {
          "redacted": "redacted",
          "redacted": "redacted"
        },
        {
          "redacted": "redacted",
          "redacted": "redacted"
        },
        {
          "redacted": "redacted",
          "redacted": "redacted"
        },
        {
          "redacted": "redacted",
          "redacted": "redacted"
        },
        {
          "redacted": "redacted",
          "redacted": "redacted"
        },
        {
          "redacted": "redacted",
          "redacted": "redacted"
        },
        {
          "redacted": "redacted",
          "redacted": "redacted"
        },
        {
          "redacted": "redacted",
          "redacted": "redacted"
        },
        {
          "redacted": "redacted",
          "redacted": "Wisconsin

Notice the end mid string, no JSON, no stop reason etc. I believe this to be a bug

qhenkart · April 21, 2025, 4:27am

This is getting worse. Now most requests I make to OpenAI are returned as “successful” and I am charged for it. Yet the Structured Output (with strict set to true) is completely disregarded and I get a JSON object cut off mid string rendering the generation useless and getting charged for it… Not sure what to do, our existing retry system eventually gets it to work but a single generation now takes 10+ tries until OpenAI gets it right

My concern is that since there is no Error on the openAI side, they don’t even know this is happening, and despite submitting a report to the help center, it was only responded to by an obvious ChatGPT bot. Anyone else have a better way to submit a bug report?

Erik_Zhang · April 24, 2025, 9:24pm

Did you fix this? Hitting the same error I think…

qhenkart · April 27, 2025, 7:25am

@Erik_Zhang I found a solution that works 100% of the time.

I recommend against retry or validation systems. Since the API doesn’t identify this as a bug or malformed response, the retry system will be expensive, as you will get charged for every API request regardless of whether it actually returns a usable response or not. I was seeing exponential increase in generation cost as it would take 5-10 tries to get a successful response

Instead I split the generation up into 2 phases
Phase 1: Research Phase: Use TEXT RESPONSE ONLY the web search cannot reliably return JSON so don’t try to make it do so or it will fail. Prompt the AI to research the information you need and provide a research paper on its findings with lots of citations and annotations

Phase 2: Analyze Phase: Now pipe the text response into a NON WEB MODEL. I used gpt-4.1. Now you can use the research results from the previous phase to inform the model with the information it needs. Here you can use structured outputs, and it will analyze, and format the data in the way you need.

This avoids the bug entirely. Even if the model fails you’ll still get 80% of a research paper and no failed response

good luck

Allan_Martins · April 30, 2025, 3:29pm

Same issue here, it seems to break the json around 2k tokens

Alex_BCH · June 21, 2025, 11:32am

Same issue… I tried your solution qhenkart, but even like that, it was not working. Hope OpenAI will fix it soon, before xAI publish their API web model!

Topic		Replies	Views
Agent using WebSearchTool with structured outputs results in validation error with JSON unexpectedly ending with EOF around 6000 characters API agents-sdk	6	728	December 26, 2025
Fine-tune 4o model - endless inference for JSON Bugs	7	237	December 26, 2025
Structured Output Issue in GPT-4o API – Response Truncation at Specific Index API api , structured-output	2	392	December 26, 2025
Response_format=json_object returns invalid json with finish_reason=stop Bugs json-mode	7	881	January 7, 2025
Openai web search token limit issue Bugs	4	538	March 25, 2025

Web Search Completion Cuts Off Response and ignores structured outputs on complex prompts

Related topics