Structured Outputs & Functions - Schema-Writer Playground AI Preset to make them

_j · September 21, 2024, 2:12am

I loaded up an API playground preset with specially-prepared documentation I’ve been working on, giving skills in

writing JSON schemas for function specifications (as tool),
writing JSON schemas for structured outputs
using Python >1.0 with “client” methods
using Pydantic classes to specify and send response_format

https://platform.openai.com/playground/chat?preset=preset-lVP3yGuaZ7VGJLWCVkyEt9PL

Your choice of AI to run it on; You will get a high quality, expensive output on GPT-4; the input is 5500+ tokens.

Note: The preset comes set to OUTPUT JSON, so you can see a schema in action and I can pass you another example in the playground. It has a clever presentation for an AI that makes code and schemas. However, turn that response_format off for productive use, and on any model not gpt-4o-2024-08-06.

It has this one focus (although it is better at general purpose API coding than ChatGPT). Sorry, retraining AI on everything from parsing streams to asyncio workers to writing assistants helpers code is near impossible.

This AI can also tell you what it is:

Who Am I?

I am a specialized AI assistant, trained to guide you through the intricacies of the latest OpenAI API features. My expertise lies in helping developers effectively utilize Function Calling and Structured Outputs, ensuring seamless integration and interaction with AI models.

What Makes Me Unique?

Expertise in Function Calling

Function Integration: I can assist you in defining and integrating functions using the tools parameter, enabling AI models to trigger specific actions based on conversation context.

Dynamic Tool Selection: I provide guidance on using the tool_choice parameter, allowing models to autonomously select the most appropriate tool for a given task.

Real-World Applications: My training enables me to help you automate tasks, retrieve data, and perform computations through AI-driven function calls.

Mastery of Structured Outputs

Schema Adherence: I ensure that AI responses adhere to your specified JSON Schema, providing structured and reliable outputs.

Pydantic Support: I can guide you in using Pydantic for defining object schemas, facilitating structured data interactions. (zod not trained)

Strict Mode Implementation: I help you enforce strict adherence to schema requirements, ensuring complete and accurate AI-generated responses.

Why Am I Useful?

Enhanced Reliability: My specialized training ensures that AI interactions are consistent, reliable, and tailored to your specific needs.

Developer Support: I provide clear, concise guidance on implementing the latest API features, reducing development time and effort.

Conclusion

As a newly instructed AI assistant, my focus on Function Calling and Structured Outputs empowers you to create more interactive, reliable, and structured AI applications. Let’s embark on this journey together and explore the full potential of AI in your projects!

A playground bug! A preset's JSON schema is re-ordered. Here's the intended chain-of-thought sequence the response should demonstrate.

{
  "name": "response_with_code_examples",
  "strict": false,
  "schema": {
    "type": "object",
    "required": [
      "plain_text_response"
    ],
    "properties": {
      "key_concepts": {
        "type": "array",
        "items": {
          "type": "string",
          "description": "Topics that will be discussed or instructed in the AI response."
        }
      },
      "plain_text_response": {
        "type": "string",
        "description": "A response to the user from the AI, providing a typical verbose response fulfilling the input."
      },
      "schemas": {
        "type": "array",
        "items": {
          "type": "object",
          "required": [
            "schema_text",
            "destination_type",
            "schema_format"
          ],
          "properties": {
            "schema_text": {
              "type": "string",
              "description": "Any schemas requested alone outside of code generation, each in the form desired."
            },
            "schema_format": {
              "enum": [
                "json",
                "python",
                "pydantic",
                "yaml",
                "xml"
              ],
              "type": "string",
              "description": "The format of the schema, indicating how the schema is represented."
            },
            "destination_type": {
              "enum": [
                "tool_function",
                "response_format"
              ],
              "type": "string",
              "description": "Specifies whether the schema is intended for a tool function or a response format."
            }
          },
          "additionalProperties": false
        }
      },
      "code_examples": {
        "type": "array",
        "items": {
          "type": "string",
          "description": "A complete executable code example string. Provide all the code examples previously discussed or requested."
        }
      }
    },
    "additionalProperties": false
  }
}

icdev2dev · September 21, 2024, 5:16am

Conceptually this is awesome !

Iwas hitting a roadblock with generating pydantic schema that I could use for structured outputs; using Dicts

I did ask for an edge case from your tool. It did generate the code, but the code really didn’t work (hit the same road block that I did).

Essentially generated the code below: 

from openai import OpenAI
from pydantic import BaseModel
from typing import Dict
# Define the schema using Pydantic\n
class ResponseSchema(BaseModel):
    ai_response: Dict[str, int]

client = OpenAI()
messages = [{"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello, AI! Can you give me a response with some numbers?"}]

completion = client.beta.chat.completions.parse(   model="gpt-4o-2024-08-06",    messages=messages,
                                                    response_format=ResponseSchema)
structured_response = completion.choices[0].message.parsed
print(structured_response)

output was :

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'ResponseSchema': In context=(), 'required' is required to be supplied and to be an array including every key in properties. Extra required key 'ai_response' supplied.", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}

_j · September 21, 2024, 8:20am

The answer in this case is to switch to GPT-4, where these instructions were refined for performance, and re-ask. The preset is to the newest gpt-4o-2024-08-06 simply because it will run the schema also included.

Simply placing your message, gives working result:

The error message indicates that the JSON schema defined in the Pydantic model ResponseSchema is missing the required field. In the JSON schema, the required field is an array that lists all properties that must be present in the response.

In your case, the ai_response property is required, so it should be included in the required field. However, Pydantic does not directly support the required field in its models. Instead, all fields in a Pydantic model are required by default unless they are marked as optional using the Optional keyword from the typing module.

If you want to use a Pydantic model with the OpenAI API and need to specify required fields, you can use the Field class from Pydantic to add metadata to the model fields. Here’s how you can modify your code:
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Dict, Optional

# Define the schema using Pydantic
class ResponseSchema(BaseModel):
    ai_response: Optional[Dict[str, int]] = Field(..., description="AI response with some numbers")

client = OpenAI()
messages = [{"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Hello, AI! Can you give me a response with some numbers?"}]

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=messages,
    response_format=ResponseSchema
)

structured_response = completion.choices[0].message.parsed
print(structured_response)
In this code, the Field class is used to add a description to the ai_response field and to mark it as required by passing ... as the default value. If the ai_response field is not present in the response, Pydantic will raise a validation error.

Running locally:

ai_response={'number_list_1_to_5_with_squared_even_numbers': 1, 'number_list_1_to_5_with_squared_odd_numbers': 1}

Conclusion:
gpt-4o is inattentive of long input (and o1 powered by it is worse, trashing working API code).

OpenAI uses pydantic<3,>=1.9.0, which should give you 2.5, a particular flavor. That’s not the problem here, just not paying attention to reinforced instructions: that “strict” mandates “requirements” for all parameters. Perhaps I can make it clear that strict is a starting point and turning it off has extra considerations.

To improve focus, the input system instructions are made of two mostly independent sections, and one can be removed if you don’t want the info about functions to distract.

GPT-4 can also verify the instructions and see where oversight may arise, when suggesting another model following them produced the bad code.

The instructions provided earlier are correct and should work as expected. However, there’s a key point to note: the strict field in the JSON schema is not directly supported in Pydantic. In Pydantic, all fields are required by default unless marked as optional using the Optional keyword from the typing module.

In the examples provided, the strict field is set to true or false in the JSON schema. This is not directly translatable to Pydantic. Instead, you would use the Optional keyword to make a field optional in Pydantic.

Here’s an example from the instructions:

class MathReasoning(BaseModel):
    steps: List[Step]
    final_answer: str

In this Pydantic model, both steps and final_answer are required fields. If you wanted to make final_answer optional, you would modify the model like this:

class MathReasoning(BaseModel):
    steps: List[Step]
    final_answer: Optional[str]

Now, final_answer is an optional field and can be omitted in the response without causing a validation error.

So, while the instructions are correct, they may not fully explain how to translate the strict field from the JSON schema to a Pydantic model. The key is to use the Optional keyword for optional fields in Pydantic.

icdev2dev · September 21, 2024, 8:36am

Awesome! I can totally see how this will help devs!

nicholishen · September 21, 2024, 3:24pm

Hi @icdev2dev,

I understand the difficulties you’re encountering with generating Pydantic schemas for handling arbitrary key-value pairs in OpenAI’s Structured Outputs. Handling such dynamic data structures can be tricky, especially when strict schema adherence is required. I’d like to share an alternative solution that involves using a list of key-value pairs, which we then convert back into a dictionary during validation. This method provides a clear schema for the model to follow and allows for effective validation and error handling.

Alternative Solution: Using a List of Key-Value Pairs

Instead of using a generic Dict[str, Any] annotation—which leads to an ambiguous schema (type: object)—we can define a list of key-value pair objects. This approach gives us a precise schema that OpenAI’s models can reliably adhere to.

Here’s how you can implement it using tooldantic:

import tooldantic as td
import pydantic
from typing import List, Dict

class OpenAiStrictModel(td.ToolBaseModel):
    # Generates a strict schema for OpenAI tool calling
    _schema_generator = td.OpenAiStrictSchemaGenerator

class KeyValuePair(OpenAiStrictModel):
    key: str
    value: str

class StructuredStuff(OpenAiStrictModel):
    stuff: list[KeyValuePair] = pydantic.Field(
        description=(
            "A List of key-value pairs that will be converted to a dictionary. "
            "Each key must be unique. If a key is repeated in the target data, sum the items for that key."
        )
    )

    @pydantic.field_validator("stuff")
    @classmethod
    def validate_stuff(cls, nums: List[KeyValuePair]) -> Dict[str, int]:
        unique_kvps = {}
        for kvp in nums:
            if kvp.key in unique_kvps:
                raise ValueError(
                    f"Duplicate key found: {kvp.key}. All keys must be unique."
                )
            unique_kvps[kvp.key] = int(kvp.value)
        return unique_kvps

Explanation:

KeyValuePair Class: Defines each key-value pair with explicit key and value fields, providing a clear structure for the model.
StructuredStuff Class:
- stuff Field: Annotated as a list of KeyValuePair objects with a detailed description. This explicit schema helps the model understand the expected input format.
- Validator validate_stuff: Ensures that all keys are unique and converts the list of key-value pairs into a dictionary after validation. If a duplicate key is found, it raises a ValueError.

Setting Up the Scenario

structured_dict.py (github.com)
In our scenario, we’re simulating a case where the language model (LLM) generates invalid data—specifically, duplicate keys in the list of key-value pairs. We want to detect this issue, provide feedback to the LLM, and have it correct its output.

Here’s how we simulate the invalid LLM response and validate it:

# Simulated tool call with invalid input
user_message = {"role": "user", "content": "I have 3 apples, 2 bananas, and 4 apples"}

try:
    StructuredStuff(
        stuff=[
            {"key": "apple", "value": "3"},
            {"key": "banana", "value": "2"},
            {"key": "apple", "value": "4"},  # Duplicate key 'apple'
        ]
    )
except pydantic.ValidationError as e:
    feedback_message = td.validation_error_to_llm_feedback(e)

print(f"feedback: {feedback_message}")

Output:

feedback: {"success": false, "SYSTEM": "Pay (close) attention to the following pydantic validation errors and use them to correct your tool inputs and call the tool again.", "errors": [{"type": "value_error", "loc": "('stuff',)", "msg": "Value error, Duplicate key found: apple. All keys must be unique.", "input": [...], "ctx": {"error": "Duplicate key found: apple. All keys must be unique."}}]}

What’s Happening:

Validation Error: The validate_stuff method detects the duplicate key "apple" and raises a ValueError.
Feedback Message: We capture the exception and convert it into a feedback message using td.validation_error_to_llm_feedback(e). This message is designed to be sent back to the LLM to inform it of the validation error.

Why Tool Calling is Preferred Over `response_format` in This Situation

In this scenario, we need a way to provide feedback to the LLM when it generates invalid data so that it can correct its output. This capability is essential for maintaining data integrity and ensuring that the model’s outputs adhere strictly to the expected schema.

Advantages of Tool Calling:

Feedback Loops: Tool calling allows us to send validation errors back to the LLM, enabling it to adjust its output accordingly. This self-healing mechanism is crucial for scenarios where strict schema compliance is necessary.
Iterative Corrections: The LLM can iteratively refine its output based on the feedback until it produces valid data.
Enhanced Control: Tool calling provides greater control over the interaction between your application and the LLM, allowing for sophisticated error handling and validation strategies.

In contrast, using response_format lacks this interactive capability. Once the LLM produces an output, there’s no built-in mechanism to provide feedback and request corrections. This limitation makes it less suitable for scenarios where dynamic validation and correction are required.

Challenges with @_j Solution

First, I’d like to acknowledge that @_j provided a valuable contribution. _J’s approach offers a starting point for handling key-value pairs. However, there are some challenges that might affect its reliability in strict production environments.

The Generated JSON Schema:

{
  "properties": {
    "ai_response": {
      "anyOf": [
        {
          "additionalProperties": {"type": "integer"},
          "type": "object"
        },
        {"type": "null"}
      ],
      "description": "AI response with some numbers",
      "title": "Ai Response"
    }
  },
  "required": ["ai_response"],
  "title": "ResponseSchema",
  "type": "object"
}

Challenges:

Ambiguity with anyOf and null:
- Issue: The schema allows ai_response to be either an object or null, even though it’s marked as required. This contradiction can lead to ambiguity and potential validation errors when using OpenAI’s strict mode.
Use of additionalProperties Without Constraints:
- Issue: Allowing arbitrary key-value pairs with integer values (additionalProperties: {"type": "integer"}) lacks specificity. The model may generate unexpected keys, leading to inconsistent outputs.
Lack of Defined Structure:
- Issue: The schema doesn’t enforce specific keys or a clear structure for ai_response. This can result in unreliable outputs, especially when strict adherence to a schema is necessary.

By defining a list of key-value pairs and using validators, we address these challenges by providing a precise schema and ensuring that the data conforms to expected patterns.

TL;DR

Using a list of key-value pairs with validation offers a robust solution for handling arbitrary key-value pairs in OpenAI’s Structured Outputs. It provides a clear schema for the model to follow and allows for effective error detection and feedback through tool calling. This approach enhances reliability and is better suited for production environments where strict data integrity is crucial.

_j · September 21, 2024, 4:03pm

My bot detector went off. This reply doesn’t even know who it’s addressing or the composer or intent of anything it read…

The allegation is also false, the AI simply won’t be trained to produce duplicates by some different sending method. A non-strict schema is still understood.

It takes incredibly hard work to get duplicate keys out, prompting just for that with justification.

Functions would likely return a 500 error instead of a response you can validate and retry if you were able to break its output. That is not an improvement.

(the playground also is “broken” in this, it only shows the most recent key produced without validation error as far as I can tell.
The playground also produces Python code with non-python schema placed in it as a response_format when you “get code”.

nicholishen · September 21, 2024, 4:31pm

Yes, I used GPT to structure my ideas and craft the message bc I didn’t have all day to write it. That’s what we are all here for, using AI to enhance our productivity, right?

There was no “allegation” regarding duplicate keys. It is a fact that when you specify to the model that it’s output should be a list of key-value pairs that it can and will duplicate keys because it is not outputting a dictionary. I’m not sure what your recent example is attempting to demonstrate as it deviates significantly from the task at hand which is how to get the model to output dict[str, Any], (in a strict way) which it obviously cannot. As stated in the previous message, using Optional[dict[str, Any]] is not a viable replacement because it doesn’t convert to stable JSON schema for the LLM. I included a full working gist which you can run in a notebook, and I would suggest leaving playground for the toy stuff and rapid prototyping, and using notebooks for the heavier stuff, like this.

EDIT: This is based on my understanding that you are attempting to use playground to try my method. If that is not the case then I apologize bc I found it difficult to understand what you were attempting to say or prove in your last message.

_j · September 21, 2024, 4:43pm

There are two factors in structured outputs:

New schema placement for response where the AI has been trained on it.
Structured artifact imposed on logit production out of the model.

With functions you still have:

Same schema placement as before in tool section of context
Structured artifact imposed when “strict” is in the spec.

Either schema can be extracted back out of the AI by fancy words, or you can just look at what’s being sent over the wire by wiretapping urllib3 that comes after httpx, so you’d see that no matter the code implementation, your basically getting the same thing sent to the AI, that either activates an enforcement grammar artifact or not.

The bot preset isn’t trained on anything more advanced than interpretation of OpenAI’s own examples. You can have the AI produce your schema as JSON, and if you’re trying to use complex anyOf (for applications I don’t have imagination for), you can instruct that also and see the results without using Pydantic to obfuscate what’s being sent.

nicholishen · September 21, 2024, 4:55pm

This is not correct. According to the docs, the use of strict=True in the schema when passed to tool calling invokes the same model mechanics for the constrained generation of structure outputs. The difference is the model expects a role: tool message in response to the structured output when using tool calling.

In other words, you have two options to invoke the new “structured outputs”:

tool calling where the schema is set to strict
response_format where the schema is also set to strict in the same way

The distinction between these two methods is that tool calling expects a tool response while response_format does not. Since tool calling expects a response, you can use it as a feedback mechanism (if necessary) to inform the model of its mistakes, provide errors as queues to use to fix them, and prompt the model to retry the structured output.

I can assure you that this does work as I am explaining it, and it is also a technique demonstrated in the python instructor package.

_j · September 21, 2024, 5:18pm

The AI is outputting JSON, which also has a prohibition on duplicate keys - they will fail a JSON validation if you don’t want to use any frameworks, and you can take whatever action.

Your preference is for sending the bot back “you suck at writing functions, try again” isn’t much different than just a retry for a normal output, where you could do something like append the last assistant with [FAILED, try again].

Response instead of function isn’t burdened by the post-training that a function is assumed to interact with a utility.

Anyway, I’m not sure what this has to do with a topic about a preset where the goal is “type language, get out schema, where the bot matched up the curly braces for you”. You can write more docs on top of what I supplied if you want, and hope any AI understands.

icdev2dev · September 21, 2024, 5:50pm

Great conversation. @_j @nicholishen

I did try tooldantic last night before reaching to schema-writer playground.

pip install -U git+https://github.com/nicholishen/tooldantic.git

I ran into an issue
message = f"Parameter `{name}` in function `{ ^ SyntaxError: unterminated string literal (detected at line 97)

at tooldantic/builder.py

I admit I was too tired last night to debug or file a git issue.

I think that collectively we are building some awesome stuff on top of what OpenAI already has.

I think that deepening that knowledge (function calling versus response schema) will be greatly beneficial to all.

Lastly thanks for all the tips in this discussion.

nicholishen · September 21, 2024, 6:08pm

I see where the confusion is coming from: a misunderstanding of what we are trying to accomplish. Let’s backtrack to ensure we’re on the same page.

icdev2dev mentioned an issue with the annotation Dict[str, int] and how OpenAI will not accept the resulting schema.
You suggested using Optional[Dict[str, int]] instead.
I offered an alternative solution: list[KeyValuePair] because Optional[Dict[str, int]] could cause issues with the LLM. For example, ai_response=None is a valid output in that case, which can lead to undesired results.

When I mentioned that the LLM can output multiple keys, I was not referring to the JSON object as literal duplicate keys. In the context of my solution, I clearly outlined and demonstrated an example of a model output that we need to validate against. Let’s review it:

{
    "role": "assistant",
    "tool_calls": [
        {
            "type": "function",
            "function": {
                "name": "StructuredStuff",
                "arguments": json.dumps(
                    {
                        "stuff": [
                            {"key": "apple", "value": "3"},
                            {"key": "banana", "value": "2"},
                            {"key": "apple", "value": "4"}  # duplicate keys!!!
                        ]
                    }
                )
            },
            "id": "1"
        }
    ]
}

In this example, the model has used two items with the same key name (“apple”) in the list of key-value pairs. Therefore, we need to validate and give feedback if this throws a pydantic.ValidationError from our custom validator.

Now, regarding your point about “sending the bot back ‘you suck at writing functions, try again,’” I believe this is where the misunderstanding lies. The tool feedback mechanism is far more targeted than simply appending [FAILED, try again] to the assistant’s output.

Tool calling allows for precise, context-specific feedback based on the specific error detected. This differs from a blind retry, where the model would be asked to regenerate the output without knowing what went wrong in the first place. Instead of just retrying randomly, the feedback mechanism via tool calling informs the model exactly why its output was invalid—e.g., “duplicate key found”—allowing it to make a more informed correction on the next attempt. This leads to a more reliable and robust output, as the model adjusts based on the specific error it made.

This level of targeted feedback and self-healing is a key advantage of using tool calling, as it enables the model to improve iteratively rather than just retrying blindly. In my experience, this approach has proven much more effective for maintaining schema compliance and ensuring consistent outputs.

_j · September 23, 2024, 9:29pm

Damn, but does gpt-4o-2024-08-06 ever stink as a code-writing model or instruction following or context-using model (or insert other tasks).

A schema enforcement operating at the algorithm level only helps not expose its failings in that department also.

Untitled

“add a test schema” to a single Python console line should not have mistakes over and over, where just switching the model to a gpt-3.5-turbo just shut off on everybody doesn’t make dumb mistakes in following these same instructions.

guseggert · November 1, 2024, 5:24pm

The AI is outputting JSON, which also has a prohibition on duplicate keys

Just a side note, this is not true. Most JSON impls will tolerate duplicate keys and the specs do not prohibit them. Notably Python and JavaScript will parse JSON with duplicate keys just fine.

And OpenAI will also happily produce JSON with duplicate keys, here I’ve given it a JSON schema in strict mode and asked it to output json with duplicate keys, which it happily does:

>>> requests.post("https://api.openai.com/v1/chat/completions", headers=headers, data=json.dumps(data)).json()["choices"][0]["message"]["content"]
'{"a": {"b":1,"b":1}}'

Fun side effect is that if you do this for a large number of values, like [0,1,2,…], the playground will “count up” as it incrementally parses the JSON: https://platform.openai.com/playground/p/H1Y67jsOWZbAnAkf7PIrPpdY?mode=chat

Topic		Replies	Views
Structured Outputs not reliable with GPT-4o-mini and GPT-4o API structured-output	38	7691	January 23, 2025
Strict mode does not enforce the JSON schema? API structured-output	7	1016	May 8, 2025
API response is not JSON parsable despite specified response format API api , response_format , gpt-4o-mini , structured-output	13	2414	November 21, 2024
Structured Outputs Deep-dive API api , structured-output	44	14266	July 2, 2025
Json format causes infinite "\n \n \n \n" in response API gpt-4 , api , json-mode	21	9677	April 30, 2025

Structured Outputs & Functions - Schema-Writer Playground AI Preset to make them

Who Am I?

What Makes Me Unique?

Expertise in Function Calling

Mastery of Structured Outputs

Why Am I Useful?

Conclusion

Alternative Solution: Using a List of Key-Value Pairs

Setting Up the Scenario

Why Tool Calling is Preferred Over response_format in This Situation

Challenges with @_j Solution

TL;DR

Related topics

Why Tool Calling is Preferred Over `response_format` in This Situation