Harmony-based GPT-5 models return malformed structured outputs (SDK ≥ 1.100.2)

Hi, OpenAI Support bot recommended me to share my findings in this community.

I’m getting random failures with malformed JSON when calling any GPT-5 model via client.beta.chat.completions.parse with a given response_format. Errors are sporadic, but the linked gist always ends up with an error before it finishes executing demo. Usually it happens after the model went through a few reasoning steps on a task (and collected a few messages in the conversation). But with base gpt-5, I’ve seen it to happen straight away as well.

Switching model to gpt-4o eliminates the problems completely.

The concrete exception message is: Invalid JSON: trailing characters at line 2 column 1 [type=json_invalid, input_value='{"current_state":"Need ``t...,"email":"elon@x.com``"}}', input_type=str]

Basically OpenAI logic concatenates multiple JSONs with a newline, resulting in a malformed JSON that fails pydantic parsing.

I think, the root cause is interplay between the new Harmony response format in gpt-5 models and Schema-Guided Reasoning that relies on response formats to drive reasoning through the predefined paths. This improves cognitive capabilities of smaller models in predefined tasks (used mostly by teams developing with local models), but seems to trigger an edge case in GPT-5 series.

Gist to reproduce the problem (it also contains example console output with a stack trace) is here: https://gist.github.com/abdullin/332b03de6b86a134eedbc2e4b8379736#file-error_output-txt-L54-L85

The issue has been independently reproduced in our community via this SGR Demo gist and its modifications.

Has anybody encountered the same issue before? How do you work around them?

Best,
Rinat

4 Likes

Update: I tried prepending following to the system prompt to disable reasoning and see if that helps. Still hitting malformed JSON.

Active channels: final
Disabled channels: analysis, commentary

2 Likes

Exact same issue here, with gpt-5-2025-08-07 on Azure OpenAI.

The LLM generates a json object twice on two different lines:

{“operation”:{“updates”:[truncated…]}}
{“operation”:{“updates”:[truncated…]}}

Which causes this error:

  File "/app/.venv/lib64/python3.12/site-packages/pydantic/main.py", line 746, in model_validate_json
    return cls.__pydantic_validator__.validate_json(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for Operation
  Invalid JSON: trailing characters at line 2 column 1 [type=json_invalid, input_value='{"operation":{"updates":...t intensities; and"}]}}', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/json_invalid

Using python openai 1.106.1

1 Like

@jm4875 Thanks for the report! This is the same behaviour that we get.

Is this caused by a plain SO prompt or are you using SGR/SO CoT or its equivalent to drive reasoning?

Basic structured output prompt, with some function calling.

Using chat completions API

1 Like

Thanks a lot, @jm4875 I’m glad to hear that the case is reproducible by multiple parties.

Have you seen any patterns on what can be causing this problem? Or ideas how to fix that?
Patching OpenAI SDK to detect and remove duplicate JSON feels to be hacky.

Appears when using gpt5 + function calling

No idea how to fix. I improved the prompting and I’ll see in the next few days if it’s better…

Yes one way to fix that would be to post-processed the openai json text completion, and call .model_validate_json() ourselves

1 Like

I just got a report that the issue still persists on Microsoft Azure OpenAI with gpt-5-mini.

It is caused by JSON duplication.

If this happens, the hack is to intercept OpenAI responses within the SDK before the parsing (e.g. with httpx interceptor) and remove second duplicate line.