Structured model outputs (Responses API) returning invalid JSON 80% of the time

Hello everyone!

I am using Structured model outputs (Responses API) with a fairly complex prompt (however, only 14 lines long), providing web_search and file_search tools as well. Currently, I am seeing about 80% of my requests returning invalid JSON. I (mostly) ruled out client-side issues because I can also see these invalid JSONs in the API Platform Logs page.

Most of the invalid JSON issues are omitted commas, brackets and so on.

example:

{
“level”: 50,
“time”: 1774779352743,
“name”: “worker-name”,
“err”: {
“type”: “SyntaxError”,
“message”: “Expected ‘:’ after property name in JSON at position 1358 (line 1 column 1359)”,
“stack”: “SyntaxError: Expected ‘:’ after property name in JSON at position 1358 (line 1 column 1359)\n at JSON.parse ()\n at Object. (…/node_modules/openai/src/helpers/zod.ts:114:39)\n at parseTextFormat (…/node_modules/openai/src/lib/ResponsesParser.ts:134:24)\n at parseResponse (…/node_modules/openai/src/lib/ResponsesParser.ts:67:76)”
},
“jobId”: “REDACTED_JOB_ID”,
“msg”: “Job failed”
}

Has anyone encountered this? I am aware that Structured model outputs are not meant to be perfect, but 80% failure rate is insane. Am I doing anything wrong?

This is how I am calling the API:

1 Like

Hello. Unfortunately problems with structured outputs going into infinite loops when the AI tries to write something other than allowed have been happening for over two years across all models. This is only made worse when every internal tool on Responses uses language out of your control as specification, and worse, you get injected messages such as, “the user has uploaded files” which contradict truth.

You didn’t provide your Zod schema, and you don’t have a capture of what’s being received, but you likely have a loop of whitespace characters like \n or \t.

I suspect the problem could be closely correlated to web search. OpenAI drops a new system message in after the web results, telling the AI what to output—language that is going to contradict the placement of a schema as the response format in the initial system AI context.

You could investigate whether this is only happening with tool use by capturing the full response in a log, and also whether, without web search, things improve significantly.

An improvement may be to add much stronger developer-role message language about “responses”: that the JSON output is mandatory, that the output is going to an API with strict validation, that the processing is automated, and that any messages immediately following web results must be ignored so that JSON is always produced as the only response.

The final step would be to make your own web search function. It could use the chat completions model, and ingest the text returned from a query there, but then there is not direct forced injected language from OpenAI damaging application output.

1 Like

This is a really interesting idea and potentially worth exploring.

Unfortunately, I have not been able to reproduce these bugged responses from the model myself, but I have seen similar reports come up with some regularity.

@stanislav.modrak, are you willing and able to share a clear repro for this, either here or via private message?

2 Likes

100% willing, happy to allocate time to provide as much detail as you’d need. This issue is significantly impacting our use case and may even be a deal breaker. So I would love to help with fixing this.

Additionally, Re: the idea that web_search may be causing this. This issue did indeed only start after adding this tool. However, native web_search support was a critial discriminator for us when comparing LLM providers.

2 Likes

May provide this with a delay as I am on travel for the next 3 days just now. Just to clarify, are you able to pass this on to the relevant development/support team? I was unable to contact/find an email where I could get support on this directly from OpenAI. I got in touch with customer support, but that seemed to be only an automated agent with links to docs.

2 Likes

That’s perfectly fine if you need a bit more time. Also, if anyone else reading this can share an example, that would work as well.

Here in the OpenAI Developer Community Forum, the API forum, we help developers build and surface reproducible bugs to the team. This is not just out of personal interest, it is what this forum is for.

1 Like