Structured output with responses API returns tons of \n\n\n\n

Using gpt-4.1 with structured outputs I get occasional (about 1 in 10) repeated \n\n\n\n added to my structured output. The structured output BEFORE the \n starts is valid. But there are a lot of \n added (a 100k characters! ) they are neatly order in groups

\n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n 

This seems to be the same as this old one The GPT-4-1106-preview model keeps generating "\\n\\n\\n\\n\\n\\n\\n\\n" for an hour when using functions - #9 by Diet
but I started a new one because a) a different model (4.1) and b) technically valid json (but with 100k extra \n

For anyone at OpenAI that wants to check here’s an example response_id resp_6857d0cb11388198ac8ed8d5362e40d70ed5adbbe3a34036

3 Likes

In some integrations I’ve seen it output the structured output multiple times, with or without differing data, as a way to (undesirably) provide JSONL instead of using a simple array as intended. So, the main issue here appears to be that the model is allowed to continue outputting after it has completed its object, and being constrained to valid JSON symbols isn’t enough to prevent these issues.

As a workaround, the Chat Completions API has parameters for presence and frequency penalties. Using either may encourage the model to terminate its output when it’s finished.

I had another one - new pattern.

revolutionize plastic recycling and construction in Southwest Virginia."}]}\n \n\n\n\n\n\n \t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t \t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t\t\t\t \t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t (id: resp_68626b34d1c0819a84826ba7e61f77f10652f8198a192ebf )

Hey everyone!

We’re aware that GPT‑4.1 (and occasionally GPT‑4o) can add a long string of blank lines or extra array items after the correct JSON when you stream a structured‑output response. Engineering is on it. Until the fix ships, you can avoid most cases by turning off streaming or by parsing only the first complete JSON object and discarding anything that follows.

Let us know if that helps at all!

3 Likes

Thanks for the note - much appreciated.
I’m not (ever) using streaming, so it is not related to streaming.

It also takes a very long time for OpenaAI api to come back with the response in those cases, and they are 100k bytes long (filling all the way up to the max 32k tokens I defined) - but I guess I can add some code in the exception handler retry. It’s not that trivial since the sequences are not always the same - I guess I’ll ask Codex :slight_smile:

3 Likes

*** humbled *** Actually it WAS easy to work around and so I will post the problem and the solution here for anyone that runs into this problem (in Python) and needs a work around.

My original code looked something like this:

json =  json.loads(self.response.strip(), strict=False)

Turns out (thank you, o4-mini-high) that there is also just raw_decode and you do:

dec = json. JSONDecoder()
json, garbage = dec.raw_decode(self.response)

HOWEVER this of course ONLY works when proper JSON is returned in the first place. Which is not always the case. So I am still VERY MUCH waiting on a fix for this.
We do have a repeatable (goes wrong every time) query BTW if you’re interested.

5 Likes

šŸ™Œ Thank you for sharing your solution and would definitely appreciate the repeatably broken query!

2 Likes

Happy to share privately - or through responseid ( resp_6862d752b11c819a9fd7ffddb1d8eb860ae883b7f8718bd3 )

2 Likes

Thank you for sharing the solution @jlvanhulst!

There is still the issue of being charged for all those redundant \n and \t\t output tokens being generated, correct?

Also, when using the Chat Completions API, would setting the frequency_penalty to some positive number (as default is 0) help address the cost problem by truncating the response early, as in theory, every new line of \n or
\t\t would be penalized? What do you think?

The alternative for the Responses API could perhaps be setting the max_output_tokens value to a conservative number that’s well below the maximum limit for the model you’re using, depending on the use case. Maybe that could work as well?

2 Likes

Yes it is not ā€˜THE’ solution since even with the extras removed the JSON is not always valid. Also, the bad queries take forever to produce. But this ā€˜fix’ makes the problem about 50% less problematic by my estimate.

2 Likes

This should be able to be countered with logit_bias. Chat Completions. You could find all these tab combos that OpenAI trained the AI models on as token numbers (and NOBODY wants tab-indented multi-line JSON…), and harshly demote them - make only single-line JSON possible, without whitespace.

This in-string tuning could be done at the same time OpenAI is enforcing a context-free grammar and then releasing the AI model into a string where it can write these bad characters. Tabs are possible in a JSON string, but highly unlikely to be desired in any use case, as JSON itself is the data structure, not table data in a string.

Then after coming up with a long list of things the AI is trying to write (JSON structure but within the JSON data) and you killing them off, in regular interactions, and trying json_object mode, try it on your over-specified non-strict (non enforced) JSON schema…

Unfortunately, OpenAI also messed up the way that logit_bias is supposed to work. It is completely broken and without effect if you use either temperature or top_p.

Then, messed with the logprobs and delivering them for examinations of the precise production within functions or structured output, leaving you to infer token numbers and token strings yourself.

Even being able to promote special tokens (so the model is more likely to finish instead of going into a loop of multiple outputs) is blocked.

Doesn’t matter, Responses is completely feature-less. You can’t even add a crapload of tabs as a stop sequence.

So: bad models, broken API parameters violating API reference, and then…bad endpoint ā€œResponsesā€ completely blocking any such self-service.

Hi @OpenAI_Support
Is there an expected timeline for the fix?

1 Like

@OpenAI_Support I also observe this very frequently when using structured outputs with both o3 and o4-mini. These tabs and newlines get added for 5-10 minutes before resolving.

Might the fix you are talking about also help resolve things with these reasoning models (rather than just GPT 4.1 as you mentioned?)