An AI model can be talked to about what type of output it must produce.
Here’s the updated message_list
:
message_list = [
{
"role": "system",
"content": """You are an AI in an automated environment. Your responses must be in JSON format,
strictly following this validated schema named 'user_response_schema':
{
"name": "user_response_schema",
"type": "object",
"properties": {
"response_to_user": {
"type": "string"
}
},
"required": ["response_to_user"],
"additionalProperties": false
}.
Every response should include the key 'response_to_user' with a string value as the response to the user."""
},
{
"role": "user",
"content": "hi. Do my workbook task."
}
]
In this updated message, I’ve included a name
property for the schema, which is required for clarity and reference. The instruction to the AI remains clear that it must strictly follow the schema provided.
You can use this message_list
in your Python command as follows:
response = client.chat.completions.create(
messages=message_list,
model="gpt-3.5-turbo", top_p=0.5, max_tokens=1500
)
Remember, the AI will do its best to adhere to the schema based on its understanding of the instructions, but there is no enforcement mechanism within the gpt-3.5-turbo
model for JSON schema validation that actually forces the correct key names.
What is offered is a different training on JSON. This API parameter can be enabled:
response_format: object
An object specifying the format that the model must output. Compatible with GPT-4o, GPT-4o mini, GPT-4 Turbo and all GPT-3.5 Turbo models newer than gpt-3.5-turbo-1106
.
Setting to { "type": "json_object" }
enables JSON mode, which ensures the message the model generates is valid JSON.
Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length"
, which indicates the generation exceeded max_tokens
or the conversation exceeded the max context length.
Remember the restrictions of these post-devday models, still with a 4k token output limit, and if you use max_tokens, it can only be lower than that.
The AI will also try itself to shut down the answering early, so you’ve got to include “your ability to output JSON is extended to 4000 words length” or something similar to counter the built-in behavior.