API response is not JSON parsable despite specified response format

Hey everyone,

I’m experiencing an issue where, despite specifying a JSON response format in an API call, the returned content isn’t always a single, parsable JSON object. Instead, I sometimes get multiple JSON objects separated by \n, or extra spaces and newlines after the JSON. Here’s a sample to illustrate:

Curl Request Body:

{
    "model": "gpt-4o-mini-2024-07-18",
    "temperature": 0.3,
    "messages": [
        {
            "role": "system",
            "content": "When user says 'GO', send a 'A' message to the user, and immediately following that, send a 'B' message to the user in a separate message, at the same time."
        },
        {
            "role": "user",
            "content": "GO"
        }
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "output_format",
            "schema": {
                "type": "object",
                "properties": {
                    "message": {
                        "title": "Response To User",
                        "type": "string"
                    }
                },
                "required": [
                    "message"
                ],
                "additionalProperties": false
            },
            "strict": true
        }
    }
}

Choices in the API Response:

"choices": [
    {
        "index": 0,
        "message": {
            "role": "assistant",
            "content": "{\"message\":\"A\"}  \n  ",
            "refusal": null
        },
        "logprobs": null,
        "finish_reason": "stop"
    }
]

While the system prompt is kind of meaningless, this setup reproduces the issue about one in every three tries. When experimenting with another prompt, I also sometimes receive multiple JSONs, such as {\"message\":\"A\"}\n{\"message\":\"B\"}.

Why is this happening? Doesn’t specifying the response format limit the LLM grammar to only produce valid responses? I am trying to ensure that the response is reliably parsable as JSON, but it doesn’t seem to work, even with strict set to true. Am I missing something?

Any insights would be greatly appreciated!

2 Likes

It is easy to stimulate the AI into making keys that are not specified, multiple JSONs, etc, into a structured response format. The AI needs to have an system instruction that tells it what not to produce, because it can just as easily predict a newline and start more, and add input fields of it’s own into what it receives as response format. The AI has to determine when a “stop” special token is to be generated from its chat format training.

Your schema in action (spot the /n):

Don’t give it such instructions?

What I don’t understand is that I believe this shouldn’t depend on the prompt. As far as I know, specifying a response format and setting strict to true alters the LLM’s grammar such that the tokens that can be predicted are restricted. So, even if the most-probable prediction is a non-allowed token, the system returns the most-probable one among the tokens allowed by the grammar. Isn’t it how it works?

Yes, one could imagine that a grammar artifact should keep track of the nesting levels and schema keys when enforcing a logit dictionary subset. However, here, it doesn’t control what the AI can put in strings, even as keys, the excess whitespace generated not shown in the playground including linefeeds, and doesn’t make the only thing the AI can write after closing a JSON bracket “I’m done”.

At least open-ended JSONL are a product that is available to you as a feature…

I do note in my own case, that everything is being placed within your “message” as the only way the AI can respond - and then it goes freeform making more JSON within a string. So that should parse as the single desired property output.

But in my case, the LLM adds newlines, spaces (and additional JSONs) after the first encoded JSON is already appended to the response, so it is not a part of the response I specified in the response format. I understand that the response format doesn’t enforce what can be generated in a string-typed property in the JSON schema, but the extra characters aren’t appearing within the specified JSON property; they’re appended afterward.

Have you tried the exact system prompt I shared above?

Now I have. n=10. First, no input demonstrates the variety works:

Then, just over and over, temperature increased from yours.

I did this for 50 outputs. All the same.

The AI simply can’t break out for me, when properly using
response_format={"type": "json_schema", "json_schema": json_schema}, and a strict:True schema. Response is unvalidated.

Try defining your response schema to be an array of objects instead of an object and saying that you want 2 objects. You can specify in your prompt that you want exactly 2 items in the array. It should comply with that request.

Have you tried to set your temperature to 0?

I don’t think I explained my issue as clearly as I thought. The prompt I provided is just an example; my main concern is that the response doesn’t match the grammar I specified in the response format. I thought this should have never happened, considering that the strict parameter is set to true in the response format and the refusal in the API response is set to null. I wouldn’t expect grammar enforcement to depend on any hyperparameter, such as temperature etc. Is there a way to make the grammar completely strict?

No, I think I understand your problem. It’s just because I have the same issue and I’m also investigating why it is so :slight_smile:

I would expect the same as you. I have a big problem in that my schema indicates to only return “entityId”, “name” and “description” properties … but sometimes it randomly gives me “id” instead of “entityId” or even gives me new properties. That’s a huge problem especially because I’m doing event-based JSON parsing so I branch on getting e.g. “entityId” as the key value. If I suddently could get the “entityId” or another object it would be problematic.

Oh, got it. Seems like a very similar issue.

I was simply unable to activate bad behavior in all my attempts in sending.

Let’s go over how to make a “strict” request.

    # -- Examples 1: string input

    # Example schema JSON string, like from the Playground "generate" tool
    example_response_format_str = '''
{
  "name": "assistant_response_format",
  "schema": {
    "type": "object",
    "properties": {
      "response_to_user": {
        "type": "string",
        "description": "The assistant's response to the user"
      },
      "response_topic": {
        "type": "string",
        "description": "subject of this chat turn, 2-5 words"
      }
    },
    "required": ["response_to_user", "response_topic"],
    "additionalProperties": false
  },
  "strict": true
}
'''.strip()

This is a JSON request in its basic form. Scroll down to the bottom, and observe the placement of the “strict” parameter - in the first level of the schema itself.

That schema will validate, and is the fundamental understanding that the AI will have.

You can’t sent that to the API though - JSON schemas need to be enclosed in an indicating container, which may be confused as being part of the JSON itself:

    # Then, the request format follows this format for placing a JSON schema
    # str to API request string, if you were making raw API requests in JSON
    # ( comment out to see function detect when this container is missing )
    example_response_format_str = f'''{{
"type": "json_schema",
"json_schema": {example_response_format_str}
}}'''

So being wrapped, the keys for strict and the entry level of the schema is correct.

You may however, be using Python, when the destination is a parameter meant for the OpenAI Python SDK library, and you pass a JSON-like format object (and string is strangely not processed correctly). Let’s show that same representation:

    # -- Examples 2: Python JSON-like object input

    # Example schema JSON-streamable object, like from your code or Pydantic
    example_response_format_obj = {
            "name": "assistant_response_format",
            "schema": {
                "type": "object",
                "properties": {
                    "response_to_user": {
                        "type": "string",
                        "description": "The assistant's response to the user"
                    },
                    "response_topic": {
                        "type": "string",
                        "description": "subject of this chat turn, 2-5 words"
                    }
                },
                "required": ["response_to_user", "response_topic"],
                "additionalProperties": False
            },
            "strict": True
        }

Pretty similar, except for JSON and Python booleans being different, the latter needing capitalization, and the former requiring lowercase and double-quotes exclusively.

Then let’s put that into a sendable form also.

    # Again, the API parameter follows this format for placing a JSON schema
    # Python streamable dict response_format can be input to openai library
    # or for easily manipulating contents like is done here later
    # ( comment out to see the function's detection of missing container )
    example_response_format_obj = {
        "type": "json_schema",
        "json_schema": example_response_format_obj
        }

You can thus observe schema, as you might collect from a user interface window, the strict:true, “additionalProperties”: false at all nest levels, and every key field being set in “required”.

strict: false, or unspecified, or put in the wrong place, gets you a schema placed but no enforcement. Then you also can remove items from required, and the AI can consider them optional, or write more free-form output that cannot be validated.

It would be handy in a testing application if we could try out either the string or python object examples on-demand, to verify our handling. Or even use a regular schema if we have one at this point instead of using an example.

    # -- End Examples, now let's pick one (or use schema parameter) and convert to dict

    # Determine the source of the schema
    schema_source = schema if schema else (
        example_response_format_str if example_format == "string" else example_response_format_obj
    )

A string is hard to observe and manipulate, and we have a json library to convert and validate strings, so let’s detect if whatever object we got is string or dictionary-based.

    # Parse the schema based on its type
    if isinstance(schema_source, str):
        try:
            work_response_format = json.loads(schema_source)
        except Exception as e:
            error_message = error_message = (
                f"JSON string input parse error: {e}" if schema
                else f"Internal JSON string example corrupted: {e}"
            )
            raise ValueError(error_message)
    elif isinstance(schema_source, dict):
        work_response_format = schema_source
    else:
        raise ValueError("Schema input must be JSON string or dict stream only")

and convert either to Python dict object.

Good to barf on bad schemas here. The code would have already failed if you couldn’t nest a Python dict as JSON schema correctly.

How can we make sure that what was passed as an input besides our examples is actually ready to be a network request, though? We can do some checks of the content - see if it has keys where their are needed either to be a network request, or to be a valid schema. Of course, the API seems to be happy with extra keys in the wrong places, so this can’t outsmart the ingenious fool.

    # Check if the outer level contains the required "type": "json_schema"
    if ("type" in work_response_format
        and work_response_format["type"] == "json_schema"
        and "json_schema" in work_response_format):
        # Assume the structure is correct, but verify that "name" exists inside "json_schema"
        if "name" not in work_response_format["json_schema"]:
            # Attempt to conservatively add a placeholder if missing, could error here
            work_response_format["json_schema"]["name"] = "response_format"
        response_format_obj = work_response_format
    else:
        # Assume that this is an inner object that lacks proper wrapping
        # and needs to be embedded within a "json_schema" structure.
        # We check for required "name" and the "schema" at the inner level
        if "schema" not in work_response_format:
            raise ValueError("type: `json_schema` not detected, nor inner `schema`") 
        if "name" not in work_response_format:
            #work_response_format["name"] = "response_format"  # if we want to let more fly
            raise ValueError("type: `json_schema` not detected, nor inner `name`") 
        warnings.warn("Schema not in API json_schema format, enclosing for you.",
                          UserWarning)
        # Wrap the provided schema since it's detected as the inner object
        response_format_obj = {
            "type": "json_schema",
            "json_schema": work_response_format
        }

I don’t just produce an error above, but also do that wrapping into the response format automatically, and even give the schema a name that you forgot, required, (while oddly other fields that would be required for the output to function are indicated as optional in the API reference.) Give a warning that you’re supposed to have placed the parameter container yourself…

Now about that strict part…I’m not going to trust that you did it right. How about if you set strict=True or strict=False, and let the code fix whatever schema was input to comply with the wishes of the variable? (or function parameter as you will see later)

    if strict is False and "strict" in response_format_obj["json_schema"]:
        del response_format_obj["json_schema"]["strict"]
    elif strict is True:
        response_format_obj["json_schema"].update({"strict": True})

You might have thought you were making the schema strict, but placing the strict keyword in the wrong place, because of the less than stellar documentation of how to form a response_format parameter and what goes in that and the schema itself. How about if I complain about your misuse so you know to go back and fix what isn’t going to do the job?

    # List of commonly misused keys
    unnecessary_root_keys = {'name', 'strict', 'schema'}
    unnecessary_json_schema_keys = {'type'}

    # Check for unnecessary keys at the root
    for key in unnecessary_root_keys:
        if key in response_format_obj:
            warnings.warn(f"Unnecessary or misplaced key '{key}' "
                          "found at the root of JSON schema.", UserWarning)

    # Check for unnecessary keys within 'json_schema', if 'json_schema' is present
    if 'json_schema' in response_format_obj:
        for key in unnecessary_json_schema_keys:
            if key in response_format_obj['json_schema']:
                warnings.warn(f"Unnecessary or misplaced key '{key}'"
                              " found within 'json_schema'.", UserWarning)

So we’ve taken the chosen example, used its method to wrap it (or not wrap), let code do the wrapping in a request parameter if it was missing, barf if it just plain wasn’t a good JSON, and then made sure your strict structured response intention was in the right place.

What are you going to do with that, send it as Python? Only a library would accept that; requests to the API themselves are strings though, and you might not be using httpx or requests with a serializing processor for sending ‘Content-Type’: ‘application/json’ with parsing of internal parameters also.

I’d better let you choose by using another variable.

    if string_output:
        import json
        return json.dumps(response_format_obj)
    return response_format_obj

Yes, I describe a whole Python function for processing arbitrary schemas of strings or objects, of being a schema or a parameter, of rewriting desired properties within the schema, etc.

And here it is.

`from json_schema_library import process_response_schema if you like.

With parameters documented for running the coded schemas or an input, and options for any kind of response_format, even just setting older JSON mode for you.

def process_response_schema(
    schema: str | dict[str, Any] | None = None,
    strict: bool | None = None,
    structured: bool | None = None,
    enabled: bool | None = None,
    string_output: bool | None = None,
    *,
    example_format: str = "string"  # if "string", will not use the Python JSON-like example
) -> dict[str, Any] | str:

    """
    A response_format library for OpenAI, with switches, avoiding code flow change
    
    Produce or validate the schema for a structured output response.
    A preprocessor that corrects or warns about certain usage mistakes.
    Note: The schema is not validated for proper "additionalProperties" or "required".
    Note: Python 3.10+ for built in types besides Any, and union | 
    Args:
        schema (Optional[dict[str, Any]]): The schema definition. Defaults to an internal example if None.
        strict (Optional[bool]): Whether the schema should be strictly adhered to. Defaults to None (no change).
        structured (Optional[bool]): Whether the response should be enforced structured.
            Defaults to None (no change). If False, returns `json_object` with no schema passed
            (you must instruct the AI yourself).
        enabled (Optional[bool]): If False, reverts the response to normal text. Defaults to None.
        string_output (Optional[bool]): If True, outputs the schema as a string. Defaults to None.
        example_format (str): Internal example data used. Either "string" or anything else. Defaults to "string".

    Returns:
        Union[str, dict[str, Any]]: The processed schema output as `response_format`, potentially modified
        based on structuring and strictness parameters, and whether the schema included the base response type.
    
    """
    if enabled is False:
        return {"type": "text"}
    elif structured is False:
        # not structured => JSON mode w no schema:
        # You must instruct JSON in system message
        return {"type": "json_object"}
    else:
        # processing as {"type": "json_schema"} continues
        pass
    
    # -- Examples 1: string input

    # Example schema JSON string, like from the Playground "generate" tool
    example_response_format_str = '''
{
  "name": "assistant_response_format",
  "schema": {
    "type": "object",
    "properties": {
      "response_to_user": {
        "type": "string",
        "description": "The assistant's response to the user"
      },
      "response_topic": {
        "type": "string",
        "description": "subject of this chat turn, 2-5 words"
      }
    },
    "required": ["response_to_user", "response_topic"],
    "additionalProperties": false
  },
  "strict": true
}
'''.strip()

    # Then, the request format follows this format for placing a JSON schema
    # str to API request string, if you were making raw API requests in JSON
    # ( comment out to see function detect when this container is missing )
    example_response_format_str_commented = f'''{{
"type": "json_schema",
"json_schema": {example_response_format_str}
}}'''

    # -- Examples 2: Python JSON-like object input

    # Example schema JSON-streamable object, like from your code or Pydantic
    example_response_format_obj = {
            "name": "assistant_response_format",
            "schema": {
                "type": "object",
                "properties": {
                    "response_to_user": {
                        "type": "string",
                        "description": "The assistant's response to the user"
                    },
                    "response_topic": {
                        "type": "string",
                        "description": "subject of this chat turn, 2-5 words"
                    }
                },
                "required": ["response_to_user", "response_topic"],
                "additionalProperties": False
            },
            "strict": True
        }

    # Again, the API parameter follows this format for placing a JSON schema
    # Python streamable dict response_format can be input to openai library
    # or for easily manipulating contents like is done here later
    # ( comment out to see the function's detection of missing container )
    example_response_format_obj = {
        "type": "json_schema",
        "json_schema": example_response_format_obj
        }

    # -- End Examples, now let's pick one (or use schema parameter) and convert to dict

    # Determine the source of the schema
    schema_source = schema if schema else (
        example_response_format_str if example_format == "string" else example_response_format_obj
    )

    # Parse the schema based on its type
    if isinstance(schema_source, str):
        try:
            work_response_format = json.loads(schema_source)
        except Exception as e:
            error_message = error_message = (
                f"JSON string input parse error: {e}" if schema
                else f"Internal JSON string example corrupted: {e}"
            )
            raise ValueError(error_message)
    elif isinstance(schema_source, dict):
        work_response_format = schema_source
    else:
        raise ValueError("Schema input must be JSON string or dict stream only")

    # Check if the outer level contains the required "type": "json_schema"
    if ("type" in work_response_format
        and work_response_format["type"] == "json_schema"
        and "json_schema" in work_response_format):
        # Assume the structure is correct, but verify that "name" exists inside "json_schema"
        if "name" not in work_response_format["json_schema"]:
            # Attempt to conservatively add a placeholder if missing, could error here
            work_response_format["json_schema"]["name"] = "response_format"
        response_format_obj = work_response_format
    else:
        # Assume that this is an inner object that lacks proper wrapping
        # and needs to be embedded within a "json_schema" structure.
        # We check for required "name" and the "schema" at the inner level
        if "schema" not in work_response_format:
            raise ValueError("type: `json_schema` not detected, nor inner `schema`") 
        if "name" not in work_response_format:
            #work_response_format["name"] = "response_format"  # if we want to let more fly
            raise ValueError("type: `json_schema` not detected, nor inner `name`") 
        warnings.warn("Schema not in API json_schema format, enclosing for you.",
                          UserWarning)
        # Wrap the provided schema since it's detected as the inner object
        response_format_obj = {
            "type": "json_schema",
            "json_schema": work_response_format
        }

    if strict is False and "strict" in response_format_obj["json_schema"]:
        del response_format_obj["json_schema"]["strict"]
    elif strict is True:
        response_format_obj["json_schema"].update({"strict": True})


    # List of commonly misused keys
    unnecessary_root_keys = {'name', 'strict', 'schema'}
    unnecessary_json_schema_keys = {'type'}

    # Check for unnecessary keys at the root
    for key in unnecessary_root_keys:
        if key in response_format_obj:
            warnings.warn(f"Unnecessary or misplaced key '{key}' "
                          "found at the root of JSON schema.", UserWarning)

    # Check for unnecessary keys within 'json_schema', if 'json_schema' is present
    if 'json_schema' in response_format_obj:
        for key in unnecessary_json_schema_keys:
            if key in response_format_obj['json_schema']:
                warnings.warn(f"Unnecessary or misplaced key '{key}'"
                              " found within 'json_schema'.", UserWarning)
    if string_output:
        import json
        return json.dumps(response_format_obj)
    return response_format_obj

Now: if you are STILL able to break structured inputs, which I kind of doubt, throw a stop sequence in to end the output on an AI that tries to write another JSON.

stop="\n{"

:ok_hand:

I found this invaluable for forcing the Assistant API (not chat completion) to respond with JSON responses.

The chat completion doesn’t require this since it supports JSON responses natively but assistant API does not.