I reported this shortly after structured outputs were released, but had 0~90% problems and didn’t come back to it until now.
I use structured outputs, which constrains the LLM to only produce tokens that would make valid JSON that matches a provided schema. However, it is still possible to get an unparseable response because JSON is whitespace insensitive. The LLM can and does choose to do something like reply with repeating \n tokens (as I’m seeing in a few responses tonight) until it fills up the output context window, and then parsing the resulting (~max cost) response will fail.
I’ve tried putting Minified in the JSON Schema name, noting it should be minified, etc. Those things may help a little, but they haven’t been a silver bullet.
Anyone have a solution, or would OpenAI possibly consider making their structured outputs respond with minified JSON? It feels like it should be a relatively trivial adjustment.