Chat Completion responses suddenly returning malformed or inconsistent JSON

Starting this morning, the Chat Completion API has begun returning malformed or inconsistent JSON.

I am using gpt-4o-2024-11-20 with JSON-style responses. My prompts have not changed, but the model is now intermittently returning JSON objects with:

  • Missing dictionary keys

  • Keys placed at the wrong level

  • Keys spelled differently between requests

  • A different structure from the one it was reliably producing until now

The specific issue varies by request. And sometimes the response is just fine.

Expected output format (what has always been returned):

{
    "object": {
        "key1": "REDACTED",
        "key2": "REDACTED",
        "key3": "REDACTED"
    }
}

Example of the malformed output (starting today):

{
    "key2": "REDACTED",
    "object": {
        "key1": "REDACTED"
    },
    "key3": "REDACTED"
}

In other cases, the keys are spelled differently or appear in unexpected positions.

Has something recently changed on the model or API side that could cause previously consistent JSON structures to become unstable? And is there a known workaround or path to resolution? Anyone else experiencing this issue?

This is a production workflow that has been stable for a long time (almost a year!), so any guidance would be appreciated.

THANK YOU!

3 Likes

Hi and welcome back!

From your description I read that you are using JSON mode and not structured outputs?
You are probably aware the first one means setting the response_format to { "type": "json_object" } while the second requires strict: true.

If so, this would not be an explanation why you are suddenly observing different behavior but could be a straightforward solution.

2 Likes

You are correct - this would NOT be an explanation.

Running the same model with the same inputs for a year, and then the behavior changes == OpenAI messed with the model or the endpoint.

6 Likes

This is a real bug.
I have seen reliable {“type”:”json_object”} responses with 0% fail rate for >1 year now, but around Nov 20th this stopped working reliably. It now gives truncated ouputs and invalid jsons.

Using python openai.chat.completions.create via AzureOpenAI (or the Async one for that matter).

1 Like

Hi all, was this resolved?

Thanks

If it were, “we at OpenAI changed even more parameters on running AI models to make them cheaper”, or “we continued with more post-training on the model to achieve more of our internal goals”, or “we have a new model and are damaging the old ones to make the appearance of progress”, then it is unlikely to be “resolved”

OpenAI ‘stealths’ the “snapshot” models - they like to break live applications repeatedly with behavior changes, and then deny and say they never do that.

However, it may be that all reports are actually coming from Microsoft Azure developers.

Do you have similar issue to report? Are you on Azure like the previous message, where OpenAI can’t directly help? Can you document your input any better? For, currently, gpt-4o-2024-11-20 with a json_output on OpenAI will not obviously break on a simple non-production schema, and one would need to see what is initially being run for a year to replicate faults, and if it can be improved in input.

Test run

system message

You are an OpenAI support consultant.

# Responses

## final_response

### strict validation: true
output must always be enclosed in a compliant JSON following this schema;
output is sent to an API for display, not directly rendered as JSON;
the output format is invisible to the user - don't discuss it. 
"schema": {
    "type": "object",
    "properties": {
      "output_text": {
        "type": "string",
        "description": "The main output text generated to be seen."
      },
      "mood": {
        "type": "string",
        "description": "Briefly, sentiment and tone of user input."
      }
    },
    "required": [
      "output_text",
      "mood"
    ],
    "additionalProperties": false
  }
}

Input: (check your own schema, AI…)

Output; top_p: 0.8

{
"output_text": "The response schema provided in the previous message appears to be valid JSON. However, I will address potential issues and quality concerns:\n\nValidation Issues:\n1. No Errors in JSON Validity: The schema adheres to proper JSON formatting rules, so there are no outright errors that would make it invalid JSON.\n\nQuality Issues for Expected Use-Case:\n1. Description Clarity: The description of the output_text field could be more detailed to specify the context or constraints for generating the main output text. For example, should it avoid technical jargon? Should it focus on brevity or detail?\n2. mood Description: The description of the mood field mentions "sentiment and tone," but it could specify the expected range of values or provide examples (e.g., "happy," "neutral," "angry"). This would help ensure consistency in its usage.\n3. Lack of Constraints: There are no constraints or formats specified for the output_text or mood fields. For example, specifying a maxLength for output_text or enumerating possible values for mood (e.g., a predefined set of tones) would enhance the schema’s robustness.\n4. Error Handling: The schema does not account for cases where fields might need default values or where data might be missing (e.g., if mood is undefined, what should happen?).\n5. Additional Context: It might be helpful to include an additional property or field to capture metadata, such as a timestamp, to contextualize the response further.\n\nOverall, while the schema is valid JSON, improving descriptions and adding constraints would enhance its quality for practical use.",
"mood": "analytical"
}

I am getting degraded outputs right now too! I think there might be an issue

1 Like

Definitely not resolved.
I thought that having an Azure deployment with a date is some sort of a promise but appears like its broken.
We switched about 60% of our flows to Gemini, and still seeing crazy logs and traces from the remaining 40% we still haven’t tested and moved.

Maybe this has to do with Microsoft trying to move people to their new “foundry”?

Im using direct conn to openai not azure. but this is really disapointing. the model has been fine for over a year.

and using the same ft model (not some changed snapshot of a foundation model). so not sure how it changed…. must be the software layer - haven’t even changed my version!

Yes, I can bring this up with the team. We’ll have to wait and see what happens next. For now and the future the best recommendation is to switch to Structured Outputs.
JSON mode has had recurring issues, and that’s exactly why Structured Outputs were introduced.

1 Like

Structured outputs also suffer from the same issue.
BTW the internal mechanism for structured ouputs and json mode is the same (zero-out all the log-probs of the tokens that violate the schema)

To escalate a this issue with json-mode or structured outputs, it really helps the team if you can share a few response IDs. That allows them to inspect and reproduce the problem directly.

Also, keep in mind that structured outputs work differently from JSON mode. JSON mode relies on prompting plus post-validation, while structured outputs use constrained decoding based on the schema’s context-free grammar.

Since Microsoft hosts these models themselves, providing the response IDs from the OpenAI API will help the team investigate this more quickly.

I’m using .NET and the Responses API - any pointer on docs to set this up? Happy to switch to structured out as the system is getting built at the moment. Switch code from assistant API where everything was working great - this situation is not idea :frowning: A bit of a moan - but the .NET NuGet package also seems to have some issues features the docs say it has then on trying to use those features they are missing - less of an issue as going directly with the SDK works but again not idea.

I appreciate you folks are likely run off your feet by really need a solution even if it’s only in sight for the end of the year.

Thanks for assisting
Json failure:
response id: chatcmpl-Cla2GyONCawwxYxXdzFyHId6Tauim

For the Structured output failure:
request_id: fa3168bc-496d-4b21-9652-d7fe09370d82

I couldn’t retrieve the response id conveniently, since it is not returned from the python openai package, because it fails. I wasted enough tokens on waiting for bad responses by now.
here’s an example error stack for this case:

`
File “/home/roy/code/anchor/apps/py-server/.venv/lib/python3.12/site-packages/openai/resources/chat/completions/completions.py”, line 1621, in parse
return await self._post(
^^^^^^^^^^^^^^^^^
File “/home/roy/code/anchor/apps/py-server/.venv/lib/python3.12/site-packages/openai/_base_client.py”, line 1794, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/roy/code/anchor/apps/py-server/.venv/lib/python3.12/site-packages/openai/_base_client.py”, line 1599, in request
return await self._process_response(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/roy/code/anchor/apps/py-server/.venv/lib/python3.12/site-packages/openai/_base_client.py”, line 1688, in _process_response
return await api_response.parse()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/roy/code/anchor/apps/py-server/.venv/lib/python3.12/site-packages/openai/_response.py”, line 432, in parse
parsed = self._options.post_parser(parsed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/roy/code/anchor/apps/py-server/.venv/lib/python3.12/site-packages/openai/resources/chat/completions/completions.py”, line 1615, in parser
return _parse_chat_completion(
^^^^^^^^^^^^^^^^^^^^^^^
File “/home/roy/code/anchor/apps/py-server/.venv/lib/python3.12/site-packages/openai/lib/_parsing/_completions.py”, line 146, in parse_chat_completion
“parsed”: maybe_parse_content(
^^^^^^^^^^^^^^^^^^^^
File “/home/roy/code/anchor/apps/py-server/.venv/lib/python3.12/site-packages/openai/lib/_parsing/_completions.py”, line 199, in maybe_parse_content
return _parse_content(response_format, message.content)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/roy/code/anchor/apps/py-server/.venv/lib/python3.12/site-packages/openai/lib/_parsing/_completions.py”, line 262, in _parse_content
return cast(ResponseFormatT, model_parse_json(response_format, content))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/roy/code/anchor/apps/py-server/.venv/lib/python3.12/site-packages/openai/_compat.py”, line 171, in model_parse_json
return model.model_validate_json(data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/roy/code/anchor/apps/py-server/.venv/lib/python3.12/site-packages/pydantic/main.py”, line 766, in model_validate_json
return cls.pydantic_validator.validate_json(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core.pydantic_core.ValidationError: 1 validation error for EvalResponse
Invalid JSON: EOF while parsing an object at line 1 column 59 [type=json_invalid, input_value='{“is_valid”: true, "key
…ed": 3, “similarity”: 9’, input_type=str]

1 Like

I will send some IDs as well. I’m getting issues with structured outputs not json mode.

1 Like

Please do so @toastywaffles888 this is not an issue I can reproduce easiily.

@roy.amit thanks for sharing your request ids!