Canonical way of turning pydantic schemas into function/tools schemas?

zzbbyy · January 3, 2024, 3:26pm

The json schema that pydantic produces (JSON Schema - Pydantic) is very close to what is in the functions and tools examples here - but not exactly the same. For example in the pydantic schemas there are additional title fields. It is also useful to add descriptions taken from docstrings.
Is there any canonical way of turning pydantic schemas into functions/tools definitions?

Maybe I miss something - but some projects just remove all title fields from the schema:
simpleaichat/simpleaichat/chatgpt.py at ea46c1f28e8dcdeda5935fcbab3d2ed07dc870de · minimaxir/simpleaichat · GitHub and treat title as a reserved word, others don’t change the schema in any way: taifun/taifun/taifun.py at main · mxab/taifun · GitHub, yet others seem to add descriptions but don’t remove title fields: instructor/instructor/function_calls.py at main · jxnl/instructor · GitHub

_j · January 3, 2024, 8:17pm

You have a question that is probably a perceived part of a larger application goal.

If you are talking about schema for function that can validate, then no, it can’t handle checking or creating:

"string_w_enum": {"type": "string", "enum": ["Happy", "Sad"]},

I don’t know where you’d get a schema “for free” to then want to process it. An object that came from another function or API?

I just happen to have an environment with pydantic models for asking them why they are useful…

Summary


print(json.dumps(file_obj.model_json_schema(), indent=2))
                    
{
  "additionalProperties": true,
  "properties": {
    "id": {
      "title": "Id",
      "type": "string"
    },
    "bytes": {
      "title": "Bytes",
      "type": "integer"
    },
    "created_at": {
      "title": "Created At",
      "type": "integer"
    },
    "filename": {
      "title": "Filename",
      "type": "string"
    },
    "object": {
      "const": "file",
      "title": "Object"
    },
    "purpose": {
      "enum": [
        "fine-tune",
        "fine-tune-results",
        "assistants",
        "assistants_output"
      ],
      "title": "Purpose",
      "type": "string"
    },
    "status": {
      "enum": [
        "uploaded",
        "processed",
        "error"
      ],
      "title": "Status",
      "type": "string"
    },
    "status_details": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Status Details"
    }
  },
  "required": [
    "id",
    "bytes",
    "created_at",
    "filename",
    "object",
    "purpose",
    "status"
  ],
  "title": "FileObject",
  "type": "object"
}

zzbbyy · January 3, 2024, 8:33pm

I don’t quite understand what you are asking me about - I don’t validate the schemas - but I want something that would be accepted by the completions call. The plan is to have functions that consume a pydantic object - and then generate the pydantic object schema and put it into the functions/tools parameter to the completions call.

I have written a library that generates this tools definition from type annotations - but nobody uses type annotations - so pydantic is probably a better choice and I want to add it there. And also my code does not handle deeper structures. Here is my current code: GitHub - zby/ToolDefGenerator: Generates a structure suitable for the tools argument in a client.chat.completions.create call

DarthFader · January 4, 2024, 1:04am

Have a look at this code in Langroid — we allow specification of a tool/ function as a Pydantic class along with associated helper methods and convert them into the appropriate format needed to send to an OpenAI API as a function specification. We also do the reverse - match the LLM JSON output to a Pydantic class and trigger the corresponding handler.

github.com

langroid/langroid/blob/main/langroid/agent/tool_message.py

"""
Structured messages to an agent, typically from an LLM, to be handled by
an agent. The messages could represent, for example:
- information or data given to the agent
- request for information or data from the agent
- request to run a method of the agent
"""

from abc import ABC
from random import choice
from typing import Any, Dict, List, Type

from docstring_parser import parse
from pydantic import BaseModel

from langroid.language_models.base import LLMFunctionSpec
from langroid.utils.pydantic_utils import _recursive_purge_dict_key


class ToolMessage(ABC, BaseModel):

This file has been truncated. show original

(Langroid is an Agent-oriented LLM programming Python framework from ex-CMU/UW-Madison researchers).

You can see an example of Pydantic-based function/tool in action in this colab quick start that builds up to a 2-agent system where one agent gathers structured info from a document by sending questions to another RAG-enabled agent:

zzbbyy · January 4, 2024, 11:29am

Thanks!

That was interesting unfortunately it does not cover the recursive cases.

Here is an example.
I’ve added the plain pydantic classes (with 1 suffixes) - you can test that they do produce the needed schemas.

import json
from langroid.agent.tool_message import ToolMessage
from typing import List, Optional, Type
from pydantic import BaseModel
from pprint import pprint



class Foo(ToolMessage):
    request = ''
    purpose = ''
    count: int
    size: Optional[float] = None


class Bar(ToolMessage):
    request = ''
    purpose = ''
    apple: str = 'x'
    banana: str = 'y'


class Spam(ToolMessage):
    request = ''
    purpose = ''
    foo: Foo
    bars: List[Bar]

class Foo1(BaseModel):
    count: int
    size: Optional[float] = None


class Bar1(BaseModel):
    apple: str = 'x'
    banana: str = 'y'


class Spam1(BaseModel):
    foo: Foo1
    bars: List[Bar1]


llmschema = Spam.llm_function_schema(request=True)
definitions = llmschema.parameters.get('definitions').get('Foo')
pprint(definitions)
for i in definitions:
    print(i)
    print(definitions.get(i))
    print(json.dumps(definitions.get(i), indent=4))
    print('---------------------')

The output of that is:

/home/zby/gpt/langroid/venv/bin/python /home/zby/gpt/langroid/schema_example.py 
{'description': 'Abstract Class for a class that defines the structure of a '
                '"Tool" message from an\n'
                'LLM. Depending on context, "tools" are also referred to as '
                '"plugins",\n'
                'or "function calls" (in the context of OpenAI LLMs).\n'
                'Essentially, they are a way for the LLM to express its intent '
                'to run a special\n'
                'function or method. Currently these "tools" are handled by '
                'methods of the\n'
                'agent.\n'
                '\n'
                'Attributes:\n'
                '    request (str): name of agent method to map to.\n'
                '    purpose (str): purpose of agent method, expressed in '
                'general terms.\n'
                '        (This is used when auto-generating the tool '
                'instruction to the LLM)\n'
                '    result (str): example of result of agent method.',
 'exclude': {'result', 'purpose'},
 'properties': {'count': {'type': 'integer'},
                'purpose': {'default': '', 'type': 'string'},
                'request': {'default': '', 'type': 'string'},
                'result': {'default': '', 'type': 'string'},
                'size': {'type': 'number'}},
 'required': ['count'],
 'type': 'object'}
description
Abstract Class for a class that defines the structure of a "Tool" message from an
LLM. Depending on context, "tools" are also referred to as "plugins",
or "function calls" (in the context of OpenAI LLMs).
Essentially, they are a way for the LLM to express its intent to run a special
function or method. Currently these "tools" are handled by methods of the
agent.

Attributes:
    request (str): name of agent method to map to.
    purpose (str): purpose of agent method, expressed in general terms.
        (This is used when auto-generating the tool instruction to the LLM)
    result (str): example of result of agent method.
"Abstract Class for a class that defines the structure of a \"Tool\" message from an\nLLM. Depending on context, \"tools\" are also referred to as \"plugins\",\nor \"function calls\" (in the context of OpenAI LLMs).\nEssentially, they are a way for the LLM to express its intent to run a special\nfunction or method. Currently these \"tools\" are handled by methods of the\nagent.\n\nAttributes:\n    request (str): name of agent method to map to.\n    purpose (str): purpose of agent method, expressed in general terms.\n        (This is used when auto-generating the tool instruction to the LLM)\n    result (str): example of result of agent method."
---------------------
type
object
"object"
---------------------
properties
{'request': {'default': '', 'type': 'string'}, 'purpose': {'default': '', 'type': 'string'}, 'result': {'default': '', 'type': 'string'}, 'count': {'type': 'integer'}, 'size': {'type': 'number'}}
{
    "request": {
        "default": "",
        "type": "string"
    },
    "purpose": {
        "default": "",
        "type": "string"
    },
    "result": {
        "default": "",
        "type": "string"
    },
    "count": {
        "type": "integer"
    },
    "size": {
        "type": "number"
    }
}
---------------------
required
['count']
[
    "count"
]
---------------------
exclude
{'result', 'purpose'}
Traceback (most recent call last):
  File "/home/zby/gpt/langroid/schema_example.py", line 48, in <module>
    print(json.dumps(definitions.get(i), indent=4))
  File "/usr/lib/python3.10/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/usr/lib/python3.10/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/usr/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type set is not JSON serializable

Process finished with exit code 1

The immediate problem is that excludes is a set and is not serializable by json - but the underlying problem is that excludes is not removed in the definitions.

The recursive case is exactly the hard case that I would like to have covered by a library.

DarthFader · January 4, 2024, 12:11pm

ToolMessage itself is not recursive. The way you would work with a nested Pydantic structure is define your nested structure purely in Pydantic via BaseModel and then stick the nested class name as one of the params in a ToolMessage. In your case you would have a ToolMessage named Spam with a field called “”myspam” of type Spam1.

In fact the nested case is exactly what I illustrate in the Colab.

zzbbyy · January 5, 2024, 9:21pm

Thanks for that clarification!
I still don’t like _recursive_purge_dict_keys - but I guess I need to live with this.

Topic		Replies	Views
Tool Calls - Does the Schema Matter? API	3	1774	July 9, 2024
Openai.pydantic_function_tool() Vs function_to_schema() API	4	1175	January 27, 2025
Extended or minimal Schemas for tool parameters? API api , tool	2	2388	January 7, 2024
A correct message in response to a tool call cannot validate as ChatCompletionMessage API	5	5837	April 23, 2024
Structured output Precision / Accuracy: Pydantic vs a Schema API structured-output	3	2550	December 20, 2024

Canonical way of turning pydantic schemas into function/tools schemas?

Related topics