Using gpt-4.1 with structured outputs I get occasional (about 1 in 10) repeated \n\n\n\n added to my structured output. The structured output BEFORE the \n starts is valid. But there are a lot of \n added (a 100k characters! ) they are neatly order in groups
In some integrations Iāve seen it output the structured output multiple times, with or without differing data, as a way to (undesirably) provide JSONL instead of using a simple array as intended. So, the main issue here appears to be that the model is allowed to continue outputting after it has completed its object, and being constrained to valid JSON symbols isnāt enough to prevent these issues.
As a workaround, the Chat Completions API has parameters for presence and frequency penalties. Using either may encourage the model to terminate its output when itās finished.
Weāre aware that GPTā4.1 (and occasionally GPTā4o) can add a long string of blank lines or extra array items after the correct JSON when you stream a structuredāoutput response. Engineering is on it. Until the fix ships, you can avoid most cases by turning off streaming or by parsing only the first complete JSON object and discarding anything that follows.
Thanks for the note - much appreciated.
Iām not (ever) using streaming, so it is not related to streaming.
It also takes a very long time for OpenaAI api to come back with the response in those cases, and they are 100k bytes long (filling all the way up to the max 32k tokens I defined) - but I guess I can add some code in the exception handler retry. Itās not that trivial since the sequences are not always the same - I guess Iāll ask Codex
*** humbled *** Actually it WAS easy to work around and so I will post the problem and the solution here for anyone that runs into this problem (in Python) and needs a work around.
Turns out (thank you, o4-mini-high) that there is also just raw_decode and you do:
dec = json. JSONDecoder()
json, garbage = dec.raw_decode(self.response)
HOWEVER this of course ONLY works when proper JSON is returned in the first place. Which is not always the case. So I am still VERY MUCH waiting on a fix for this.
We do have a repeatable (goes wrong every time) query BTW if youāre interested.
There is still the issue of being charged for all those redundant \n and \t\t output tokens being generated, correct?
Also, when using the Chat Completions API, would setting the frequency_penalty to some positive number (as default is 0) help address the cost problem by truncating the response early, as in theory, every new line of \n or \t\t would be penalized? What do you think?
The alternative for the Responses API could perhaps be setting the max_output_tokens value to a conservative number thatās well below the maximum limit for the model youāre using, depending on the use case. Maybe that could work as well?
Yes it is not āTHEā solution since even with the extras removed the JSON is not always valid. Also, the bad queries take forever to produce. But this āfixā makes the problem about 50% less problematic by my estimate.
This should be able to be countered with logit_bias. Chat Completions. You could find all these tab combos that OpenAI trained the AI models on as token numbers (and NOBODY wants tab-indented multi-line JSONā¦), and harshly demote them - make only single-line JSON possible, without whitespace.
This in-string tuning could be done at the same time OpenAI is enforcing a context-free grammar and then releasing the AI model into a string where it can write these bad characters. Tabs are possible in a JSON string, but highly unlikely to be desired in any use case, as JSON itself is the data structure, not table data in a string.
Then after coming up with a long list of things the AI is trying to write (JSON structure but within the JSON data) and you killing them off, in regular interactions, and trying json_object mode, try it on your over-specified non-strict (non enforced) JSON schemaā¦
Unfortunately, OpenAI also messed up the way that logit_bias is supposed to work. It is completely broken and without effect if you use either temperature or top_p.
Then, messed with the logprobs and delivering them for examinations of the precise production within functions or structured output, leaving you to infer token numbers and token strings yourself.
Even being able to promote special tokens (so the model is more likely to finish instead of going into a loop of multiple outputs) is blocked.
Doesnāt matter, Responses is completely feature-less. You canāt even add a crapload of tabs as a stop sequence.
So: bad models, broken API parameters violating API reference, and thenā¦bad endpoint āResponsesā completely blocking any such self-service.
@OpenAI_Support I also observe this very frequently when using structured outputs with both o3 and o4-mini. These tabs and newlines get added for 5-10 minutes before resolving.
Might the fix you are talking about also help resolve things with these reasoning models (rather than just GPT 4.1 as you mentioned?)