Does the LLM have access to my Pydantic model when using Structured Output?

I understand that the adherence to supplied Pydantic Base Model is achieved through constrained decoding. But I would like to know if the prompt is also augmented in some way with the supplied schema. Can I say things like “Use the supplied schema” in the prompt? I have seen that description is added to the Pydantic model fields to help the LLM. How exactly is this fed to the LLM? I’m asking this but I’m also aware that probably this may be something Open AI hasn’t revealed at all

1 Like

Yes, like the tools section, the response format schema is placed into system message.

If you have trouble reading a schema that is within the schema being followed itself…

# Response Formats

## response_format

{"type":"object","properties":{"verbatim_response_requested":{"type":"string","description":"The verbatim response that was requested."},"was_output_complete":{"type":"boolean","description":"Indicates whether the output was complete."},"was_output_accurate":{"type":"boolean","description":"Indicates whether the output was accurate."}}}

You are trained on data up to October 2023.

You can say “as your response, only JSON following response_format is allowed”, for those cases where it is not strict and the models just dump endless garbage newlines or tabs instead.

4 Likes

@_j has given a good technical answer above, and I would like to expand upon their answer from the perspective of prompt engineering and the implications of including instructions within the schema. As you can see, the schemas are included in the LLMs instructions.

FYI, the system prompt gets formatted differently from the example above when using tool calling instead of response_format.

Typically, you would not refer to the schemas like “use the supplied schema”. From the system prompt you would refer to the schemas explicitly by name. The docstring of your pydantic model will convert to the tool’s description in the JSON schema, and there you will define the tool’s intent and use, written as a directive statement. To demonstrate the significance of the inclusion of schemas into your prompt payload, consider the following example:

import openai
import pydantic
import tooldantic
from IPython.display import Markdown, display

client = openai.OpenAI()


class CognitiveArchitecture(tooldantic.OpenAiBaseModel):
    """Use this tool to guide your cognition."""
    chain_of_thoughts: str = pydantic.Field(description="Reflect on the query and discuss your thoughts.")
    first_draft: str
    reflect_on_first_draft: str
    recommendations_to_improve: str
    final_draft: str


system_message = """\
Always process all messages as if you were using your `CognitiveArchitecture` tool, \
but instead of outputting the results in JSON you will output in natural language.
Never call the tool directly, instead use is as a guide for your outputs in the chat. \
Use the tool parameters as labels. For example if the tool has a `chain_of_thoughts` param, \
your output might look like this:
'''
Thoughts: <your thoughts here>
'''

Your outputs should be long and verbose.
"""

r = client.chat.completions.create(
    model='gpt-4o-2024-05-13',
    messages=[
        {'role': 'system', 'content': system_message},
        {'role': 'user', 'content': 'write an essay about the history of the internet.'},
    ],
    tools=[CognitiveArchitecture.model_json_schema()],
    tool_choice="none",

)

content = r.choices[0].message.content
display(Markdown(content))

In this instance, a tool is incorporated strictly as a schema-driven cognitive architecture. Upon execution, the LLM adheres to the schema as though it were crafting a structured output, yet it conveys the content through chat. This enables the structuring of system instructions in novel ways, fostering more predictable outcomes. Although this represents a more radical approach to LLM prompt engineering, it exemplifies the LLM’s capacity to assess and comply with instructions presented in various forms, such as system and tool descriptions, parameter names, and their descriptions.

2 Likes