I’ve been using the new Responses API successfully for a few weeks now. Today, I am seeing responses that are long sequences of \t\t\n\n \t\t\n \t\t\n \t\t\n with no words - where there should be text from the assistant’s answer. Its seems totally intermittent. Identical requests are working sometimes and others producing that response.
Any ideas to resolve this? Anyone else with this problem? Using gpt-4o
This is a symptom that is most commonly seen when using { "type": "json_object" }
as the type of text output format, instead of providing a strict schema along with json_schema
.
If doing so, or using “strict”:false, then your system prompting must be very elaborate and specific in mandating JSON as the only allowed response type, and if not sending by schema, lay out exactly the JSON format required.
Make it resilient enough to survive stealth changes by OpenAI in the AI model quality offered in the same model name…
If you don’t have a defined format but expect plain text, and might not even use functions, then this is a very bad AI output and model regression. Responses also doesn’t offer a logit_bias to counter tabs (\t) you’d never want.
I am using a json schema for the response with strict set to true, validated in the API playground. The structure returned is correct, even when this happens, it’s just that the chat portion of the response is now sometimes just \n and \r at random. I have worked around it by catching responses like that and switching to 4o-mini when detected. I have so far only seen this behavior with 4o (currently gpt-4o-2024-08-06) but haven’t tested with any other models. This also seems to be associated with an unusually slow response time.
Thanks for explaining more. It sounds like as soon as the AI is released from the JSON’s context-free grammar enforcement and in a string, it is more predisposed to go back to the bad symptom of JSON mode without guidance, even more so because of new fault recently.
You can reduce the top_p being employed, so that if these tokens are initially less likely than the text production, there is less random chance of a tab setting off the pattern.
gpt-4o-2024-11-20
is also a choice at the same cost. gpt-4o-2024-08-13
, the destination of the “recommended model” pointer, has also seen other damaging issues, such as failing logprobs.
The message “OpenAI, stop messing with production models” is unheard for over a year. They won’t even acknowledge they broke your app, or that “snapshot” is decidedly not treated as such.
If you can furnish a replication call, we can hope there is some OpenAI response here.
Switching to gpt-4o-2024-11-20 seems to have resolved the issue for now. I can’t get it to happen with that model, but it’s now every time when setting back to 4o (which shows as 08-06 in the log). Thanks for that tip!
Out of curiosity, I tried using gpt-4.5-preview-2025-02-27 and it behaved just like 4o. Instant fail. For a little more information in case anyone wants to look into this further, it is not only responding without any text, it is hitting the max_output_tokens threshold before giving up.
Response(id=‘resp_67e4a177d074819196555a37ef4c18c2085373af44f3e882’, created_at=1743036791.0, error=None, incomplete_details=IncompleteDetails(reason=‘max_output_tokens’), …,
output=[ResponseOutputMessage(id=‘msg_67e4a1793a9481919b0cb86d68794383085373af44f3e882’, content=[ResponseOutputText(annotations=, text=‘\n \n \t \n \t \n \t \n …’, type=‘output_text’)]
Hi all! Looking into this one.
Here’s a hypothetical that may inform an investigation:
What if the special “model” of JSON mode (json_object
) remained switched on in the endpoint even when then upgrading the response format to json_schema
?
Special weights for JSON that were trained, and then poorly-informed from context, could be waiting for their opportunity to be released into a string to express their symptom…