Response_format=json_object returns invalid json with finish_reason=stop

Using the python client, we call AsyncOpenAI.chat.completions.create with response_format={"type": "json_object"}. We are not specifying any stop tokens. Since December 19th, we’re seeing “gpt-4o-mini-2024-07-18” sporadically return invalid json objects which seem to be a result of an abrupt stop in completion (json is valid up to the point the response stops). In these cases, we see finish_reason == "stop", which seems to be a bug.

    if response.choices[0].message.finish_reason == "stop":
        # In this case the model has either successfully finished generating the JSON object according to your schema, or the model generated one of the tokens you provided as a "stop token"

        if we_did_not_specify_stop_tokens:
            # If you didn't specify any stop tokens, then the generation is complete and the content key will contain the serialized JSON object
            # This will parse successfully and should now contain  "{"winner": "Los Angeles Dodgers"}"
            print(response.choices[0].message.content)

platform[dot]openai[dot]com/docs/guides/structured-outputs#json-mode

The responses contain up to about 9k output chars, less than the max 16k completion tokens.

We observed the same issue in our Azure deployment, but it’s less noticeable due to the lower traffic in that environment.

Of note, we found a similar bug on 19th indicated as resolved (the same day we started to observe the issue).
community[dot]openai[dot]com/t/json-mode-is-suddenly-messed-up/1062658

I can’t use links yet. :upside_down_face:

1 Like

The OpenAI AI is simply not going to write 9000 tokens of output. They have been trained to produce far less and there is more and more layers of “you must wrap up” and “quit now dammit” on the AI’s mind and post-training that will interrupt the output prematurely if you prompt it with something that could go on forever.

In the chat format, the end of the message is a stop sequence that is built in. It’s what the AI produces after it gives a satisfying answer. The trained chat format has a special token you can’t observe or instruct. The chance of this is not even included in the fake version of logprobs you get returned by the API.

You’ll need to break your output into multiple runs. This will also help with attention and quality, as the JSON generation growing in context window also consumes limited attention mask.

With the API now, you can also create a strict schema for your JSON output, which is enforced by a logit grammar mechanism. Then you use “json_schema” as your response format, which includes the schema. That will have a higher chance of being valid, as the stop token should be prohibited while within a JSON.

1 Like

Welcome to the dev forum @daniel.nowak1

As @_j pointed out, you can simply switch to Structured outputs and have the generated json conform to the schema.

If you choose to stick with json mode however, here’s a topic for you:

2 Likes

Thank you for the responses!

It will, depending on your use-case of course. We’ve been running it like this in PROD for months at this point.

There’s a bit of misunderstanding regarding what I meant by invalid JSON. I meant string output which does not conform to the JSON format. Meaning something like: {"a": 1 instead of {"a": 1, "b": 2}. We have not seen this behaviour before 19th.

I am aware that the beta API can follow a strict JSON schema; however, the issue arises when the output ends abruptly, resulting in a string which is not a parseable JSON.

1 Like

Can you share the code for me to reproduce this?

1 Like

If you are not using strict structured output, you can revert to gpt-4o-2024-05-13, which may be less impacted by these continuing stealth changes to models.

In a JSON, you may not see the effects of a conversational AI getting antsy about the growing length, where otherwise it will start to write “additional text here” for story outlines or #further code here after a certain point before it quits on you or plain interrupts the output mid-sentence with a curt message. That may be why it just terminates in JSON.

Unfortunately, just like the special token numbers and their strings are further obfuscated with the o200k token encoding that gpt-4o uses, the high token numbers also cannot be sent with logit_bias to discourage their use. That means you cannot avoid a stop (and would still get the poor output before that). OpenAI saying “it works OUR way”.

Unfortunately I can’t. The best I could do is to send langsmith traces, but cases I’m looking at contain sensitive data. I will chat with support and post the conclusion in the thread.

We have previously received a similar report. While it’s apparently unclear what’s causing the issue they were able to work around it with a different model snapshot.

Hope this helps somewhat.

2 Likes