I was simply unable to activate bad behavior in all my attempts in sending.
Let’s go over how to make a “strict” request.
# -- Examples 1: string input
# Example schema JSON string, like from the Playground "generate" tool
example_response_format_str = '''
{
"name": "assistant_response_format",
"schema": {
"type": "object",
"properties": {
"response_to_user": {
"type": "string",
"description": "The assistant's response to the user"
},
"response_topic": {
"type": "string",
"description": "subject of this chat turn, 2-5 words"
}
},
"required": ["response_to_user", "response_topic"],
"additionalProperties": false
},
"strict": true
}
'''.strip()
This is a JSON request in its basic form. Scroll down to the bottom, and observe the placement of the “strict” parameter - in the first level of the schema itself.
That schema will validate, and is the fundamental understanding that the AI will have.
You can’t sent that to the API though - JSON schemas need to be enclosed in an indicating container, which may be confused as being part of the JSON itself:
# Then, the request format follows this format for placing a JSON schema
# str to API request string, if you were making raw API requests in JSON
# ( comment out to see function detect when this container is missing )
example_response_format_str = f'''{{
"type": "json_schema",
"json_schema": {example_response_format_str}
}}'''
So being wrapped, the keys for strict and the entry level of the schema is correct.
You may however, be using Python, when the destination is a parameter meant for the OpenAI Python SDK library, and you pass a JSON-like format object (and string is strangely not processed correctly). Let’s show that same representation:
# -- Examples 2: Python JSON-like object input
# Example schema JSON-streamable object, like from your code or Pydantic
example_response_format_obj = {
"name": "assistant_response_format",
"schema": {
"type": "object",
"properties": {
"response_to_user": {
"type": "string",
"description": "The assistant's response to the user"
},
"response_topic": {
"type": "string",
"description": "subject of this chat turn, 2-5 words"
}
},
"required": ["response_to_user", "response_topic"],
"additionalProperties": False
},
"strict": True
}
Pretty similar, except for JSON and Python booleans being different, the latter needing capitalization, and the former requiring lowercase and double-quotes exclusively.
Then let’s put that into a sendable form also.
# Again, the API parameter follows this format for placing a JSON schema
# Python streamable dict response_format can be input to openai library
# or for easily manipulating contents like is done here later
# ( comment out to see the function's detection of missing container )
example_response_format_obj = {
"type": "json_schema",
"json_schema": example_response_format_obj
}
You can thus observe schema, as you might collect from a user interface window, the strict:true, “additionalProperties”: false at all nest levels, and every key field being set in “required”.
strict: false
, or unspecified, or put in the wrong place, gets you a schema placed but no enforcement. Then you also can remove items from required, and the AI can consider them optional, or write more free-form output that cannot be validated.
It would be handy in a testing application if we could try out either the string or python object examples on-demand, to verify our handling. Or even use a regular schema if we have one at this point instead of using an example.
# -- End Examples, now let's pick one (or use schema parameter) and convert to dict
# Determine the source of the schema
schema_source = schema if schema else (
example_response_format_str if example_format == "string" else example_response_format_obj
)
A string is hard to observe and manipulate, and we have a json library to convert and validate strings, so let’s detect if whatever object we got is string or dictionary-based.
# Parse the schema based on its type
if isinstance(schema_source, str):
try:
work_response_format = json.loads(schema_source)
except Exception as e:
error_message = error_message = (
f"JSON string input parse error: {e}" if schema
else f"Internal JSON string example corrupted: {e}"
)
raise ValueError(error_message)
elif isinstance(schema_source, dict):
work_response_format = schema_source
else:
raise ValueError("Schema input must be JSON string or dict stream only")
and convert either to Python dict object.
Good to barf on bad schemas here. The code would have already failed if you couldn’t nest a Python dict as JSON schema correctly.
How can we make sure that what was passed as an input besides our examples is actually ready to be a network request, though? We can do some checks of the content - see if it has keys where their are needed either to be a network request, or to be a valid schema. Of course, the API seems to be happy with extra keys in the wrong places, so this can’t outsmart the ingenious fool.
# Check if the outer level contains the required "type": "json_schema"
if ("type" in work_response_format
and work_response_format["type"] == "json_schema"
and "json_schema" in work_response_format):
# Assume the structure is correct, but verify that "name" exists inside "json_schema"
if "name" not in work_response_format["json_schema"]:
# Attempt to conservatively add a placeholder if missing, could error here
work_response_format["json_schema"]["name"] = "response_format"
response_format_obj = work_response_format
else:
# Assume that this is an inner object that lacks proper wrapping
# and needs to be embedded within a "json_schema" structure.
# We check for required "name" and the "schema" at the inner level
if "schema" not in work_response_format:
raise ValueError("type: `json_schema` not detected, nor inner `schema`")
if "name" not in work_response_format:
#work_response_format["name"] = "response_format" # if we want to let more fly
raise ValueError("type: `json_schema` not detected, nor inner `name`")
warnings.warn("Schema not in API json_schema format, enclosing for you.",
UserWarning)
# Wrap the provided schema since it's detected as the inner object
response_format_obj = {
"type": "json_schema",
"json_schema": work_response_format
}
I don’t just produce an error above, but also do that wrapping into the response format automatically, and even give the schema a name that you forgot, required, (while oddly other fields that would be required for the output to function are indicated as optional in the API reference.) Give a warning that you’re supposed to have placed the parameter container yourself…
Now about that strict part…I’m not going to trust that you did it right. How about if you set strict=True
or strict=False
, and let the code fix whatever schema was input to comply with the wishes of the variable? (or function parameter as you will see later)
if strict is False and "strict" in response_format_obj["json_schema"]:
del response_format_obj["json_schema"]["strict"]
elif strict is True:
response_format_obj["json_schema"].update({"strict": True})
You might have thought you were making the schema strict, but placing the strict keyword in the wrong place, because of the less than stellar documentation of how to form a response_format
parameter and what goes in that and the schema itself. How about if I complain about your misuse so you know to go back and fix what isn’t going to do the job?
# List of commonly misused keys
unnecessary_root_keys = {'name', 'strict', 'schema'}
unnecessary_json_schema_keys = {'type'}
# Check for unnecessary keys at the root
for key in unnecessary_root_keys:
if key in response_format_obj:
warnings.warn(f"Unnecessary or misplaced key '{key}' "
"found at the root of JSON schema.", UserWarning)
# Check for unnecessary keys within 'json_schema', if 'json_schema' is present
if 'json_schema' in response_format_obj:
for key in unnecessary_json_schema_keys:
if key in response_format_obj['json_schema']:
warnings.warn(f"Unnecessary or misplaced key '{key}'"
" found within 'json_schema'.", UserWarning)
So we’ve taken the chosen example, used its method to wrap it (or not wrap), let code do the wrapping in a request parameter if it was missing, barf if it just plain wasn’t a good JSON, and then made sure your strict structured response intention was in the right place.
What are you going to do with that, send it as Python? Only a library would accept that; requests to the API themselves are strings though, and you might not be using httpx or requests with a serializing processor for sending ‘Content-Type’: ‘application/json’ with parsing of internal parameters also.
I’d better let you choose by using another variable.
if string_output:
import json
return json.dumps(response_format_obj)
return response_format_obj
Yes, I describe a whole Python function for processing arbitrary schemas of strings or objects, of being a schema or a parameter, of rewriting desired properties within the schema, etc.
And here it is.
`from json_schema_library import process_response_schema if you like.
With parameters documented for running the coded schemas or an input, and options for any kind of response_format, even just setting older JSON mode for you.
def process_response_schema(
schema: str | dict[str, Any] | None = None,
strict: bool | None = None,
structured: bool | None = None,
enabled: bool | None = None,
string_output: bool | None = None,
*,
example_format: str = "string" # if "string", will not use the Python JSON-like example
) -> dict[str, Any] | str:
"""
A response_format library for OpenAI, with switches, avoiding code flow change
Produce or validate the schema for a structured output response.
A preprocessor that corrects or warns about certain usage mistakes.
Note: The schema is not validated for proper "additionalProperties" or "required".
Note: Python 3.10+ for built in types besides Any, and union |
Args:
schema (Optional[dict[str, Any]]): The schema definition. Defaults to an internal example if None.
strict (Optional[bool]): Whether the schema should be strictly adhered to. Defaults to None (no change).
structured (Optional[bool]): Whether the response should be enforced structured.
Defaults to None (no change). If False, returns `json_object` with no schema passed
(you must instruct the AI yourself).
enabled (Optional[bool]): If False, reverts the response to normal text. Defaults to None.
string_output (Optional[bool]): If True, outputs the schema as a string. Defaults to None.
example_format (str): Internal example data used. Either "string" or anything else. Defaults to "string".
Returns:
Union[str, dict[str, Any]]: The processed schema output as `response_format`, potentially modified
based on structuring and strictness parameters, and whether the schema included the base response type.
"""
if enabled is False:
return {"type": "text"}
elif structured is False:
# not structured => JSON mode w no schema:
# You must instruct JSON in system message
return {"type": "json_object"}
else:
# processing as {"type": "json_schema"} continues
pass
# -- Examples 1: string input
# Example schema JSON string, like from the Playground "generate" tool
example_response_format_str = '''
{
"name": "assistant_response_format",
"schema": {
"type": "object",
"properties": {
"response_to_user": {
"type": "string",
"description": "The assistant's response to the user"
},
"response_topic": {
"type": "string",
"description": "subject of this chat turn, 2-5 words"
}
},
"required": ["response_to_user", "response_topic"],
"additionalProperties": false
},
"strict": true
}
'''.strip()
# Then, the request format follows this format for placing a JSON schema
# str to API request string, if you were making raw API requests in JSON
# ( comment out to see function detect when this container is missing )
example_response_format_str_commented = f'''{{
"type": "json_schema",
"json_schema": {example_response_format_str}
}}'''
# -- Examples 2: Python JSON-like object input
# Example schema JSON-streamable object, like from your code or Pydantic
example_response_format_obj = {
"name": "assistant_response_format",
"schema": {
"type": "object",
"properties": {
"response_to_user": {
"type": "string",
"description": "The assistant's response to the user"
},
"response_topic": {
"type": "string",
"description": "subject of this chat turn, 2-5 words"
}
},
"required": ["response_to_user", "response_topic"],
"additionalProperties": False
},
"strict": True
}
# Again, the API parameter follows this format for placing a JSON schema
# Python streamable dict response_format can be input to openai library
# or for easily manipulating contents like is done here later
# ( comment out to see the function's detection of missing container )
example_response_format_obj = {
"type": "json_schema",
"json_schema": example_response_format_obj
}
# -- End Examples, now let's pick one (or use schema parameter) and convert to dict
# Determine the source of the schema
schema_source = schema if schema else (
example_response_format_str if example_format == "string" else example_response_format_obj
)
# Parse the schema based on its type
if isinstance(schema_source, str):
try:
work_response_format = json.loads(schema_source)
except Exception as e:
error_message = error_message = (
f"JSON string input parse error: {e}" if schema
else f"Internal JSON string example corrupted: {e}"
)
raise ValueError(error_message)
elif isinstance(schema_source, dict):
work_response_format = schema_source
else:
raise ValueError("Schema input must be JSON string or dict stream only")
# Check if the outer level contains the required "type": "json_schema"
if ("type" in work_response_format
and work_response_format["type"] == "json_schema"
and "json_schema" in work_response_format):
# Assume the structure is correct, but verify that "name" exists inside "json_schema"
if "name" not in work_response_format["json_schema"]:
# Attempt to conservatively add a placeholder if missing, could error here
work_response_format["json_schema"]["name"] = "response_format"
response_format_obj = work_response_format
else:
# Assume that this is an inner object that lacks proper wrapping
# and needs to be embedded within a "json_schema" structure.
# We check for required "name" and the "schema" at the inner level
if "schema" not in work_response_format:
raise ValueError("type: `json_schema` not detected, nor inner `schema`")
if "name" not in work_response_format:
#work_response_format["name"] = "response_format" # if we want to let more fly
raise ValueError("type: `json_schema` not detected, nor inner `name`")
warnings.warn("Schema not in API json_schema format, enclosing for you.",
UserWarning)
# Wrap the provided schema since it's detected as the inner object
response_format_obj = {
"type": "json_schema",
"json_schema": work_response_format
}
if strict is False and "strict" in response_format_obj["json_schema"]:
del response_format_obj["json_schema"]["strict"]
elif strict is True:
response_format_obj["json_schema"].update({"strict": True})
# List of commonly misused keys
unnecessary_root_keys = {'name', 'strict', 'schema'}
unnecessary_json_schema_keys = {'type'}
# Check for unnecessary keys at the root
for key in unnecessary_root_keys:
if key in response_format_obj:
warnings.warn(f"Unnecessary or misplaced key '{key}' "
"found at the root of JSON schema.", UserWarning)
# Check for unnecessary keys within 'json_schema', if 'json_schema' is present
if 'json_schema' in response_format_obj:
for key in unnecessary_json_schema_keys:
if key in response_format_obj['json_schema']:
warnings.warn(f"Unnecessary or misplaced key '{key}'"
" found within 'json_schema'.", UserWarning)
if string_output:
import json
return json.dumps(response_format_obj)
return response_format_obj
Now: if you are STILL able to break structured inputs, which I kind of doubt, throw a stop sequence in to end the output on an AI that tries to write another JSON.
stop="\n{"