Response API - Duplicate response and Invalid JSON

We have integrated the Response API with streaming enabled using the Go SDK. On the frontend, we listen to the streamed responses and render the assistant-generated answers.

We are using gpt-4.1 with JSON Schema output for the Response API and have enabled two tools:

  • Function Call – to fetch relevant RAG results

  • Web Search – to handle relevant search queries

The issue is not consistent, but sometimes we observe:

  • Duplicate responses being generated for different user queries

  • Invalid JSON output for certain languages (e.g., Swedish, Norwegian)

  • In some cases, the same JSON being generated twice without any user input

Please see the screenshots below. These are captured logs from OpenAI.

These are the steps we have taken so far:

  • Ensured that streams are properly closed in response to different lifecycle events of response streaming.

  • Verified that no dangling connections remain open.

  • Set the context window with sufficient timeout to eliminate the possibility of abrupt termination.

Would really appreciate if someone can advise how can this be avoided?

Step 1: at least mention the one of dozens of models that you may be using in your problem report, and vary them;
Step 2: do not use the ‘web_search’ tool, which will damage the output by injecting instructions about how the output is supposed to be written, regardless of structured format; use an external function or automatic placement of RAG information.

1 Like

Updated the original topic: We are using gpt-4.1, and it is not possible for us to remove the Web Search tool due to the nature of the queries.

The Web Search tool works reliably with English, but the issue becomes evident (though infrequent) when interacting in other languages.

You can remove web search as a tool that injects OpenAI’s follow-up system messages by:

Creating a function. This function can call the Chat Completions’ gpt-4o-search-preview model, which is a special model that always employs search grounding to answer, yet indeed has still an enforced: “looks like you googled” AI answer. This model’s output can be the text returned as the tool result, clear of any instructions.

Better is to use a search API service that can deliver long results, and an AI middleman that can retrieve pages of highest relevance.

1 Like

You’re right. We previously tried using SerpAPI, but it wasn’t very helpful since we needed more detailed information about each individual URL than just the SEO title and description returned by their standard Search API. We also tested their AI Overview API, but it didn’t provide AI overviews for all queries (which is understandable, as Google’s AI Overview itself isn’t available for every query). In addition, their performance compared to the built-in search tool introduced additional latency in response generation.

That said, our current setup has worked really well for our use case overall. The only ongoing issue is the generation of duplicate or very similar search outputs across different questions.

Do you have any alternative suggestions other than replacing the built-in search tool?

Thanks in advance.

This is a persistent problem with the responses endpoint, brought about especially by gpt-4.1 series but seen on other models also. The AI does not emit the correct internal stop sequence, or the API is not detecting it and allows generation to continue.

The endpoint is hopeless, because there is not even a stop parameter offered. Were you on Chat Completions and using a function, and this symptom was seen (where it is not), then you could used advanced trickery, such as putting a final key with only one enum, such as “output done”: true, and then terminate on that as a stop sequence and close the JSON yourself. But there’s tons of missing and broken stuff with Responses that makes it not “feature complete” with even Chat Completions, let alone Assistants. Injecting pre-prompt and post-tool messages that break developer applications is just the tip of what is wrong.

There’s several forum topic threads, “we fixed it” from OpenAI with the concern, where the response might as well be “(if we simply say we fixed it, maybe they will go away)” - because repeating nonstop continues in equal magnitude, with the AI also tending to go crazy and loop when it is free to write within a string, also.

1 Like