Canonical way of turning pydantic schemas into function/tools schemas?

The json schema that pydantic produces (JSON Schema - Pydantic) is very close to what is in the functions and tools examples here - but not exactly the same. For example in the pydantic schemas there are additional title fields. It is also useful to add descriptions taken from docstrings.
Is there any canonical way of turning pydantic schemas into functions/tools definitions?

Maybe I miss something - but some projects just remove all title fields from the schema:
simpleaichat/simpleaichat/chatgpt.py at ea46c1f28e8dcdeda5935fcbab3d2ed07dc870de · minimaxir/simpleaichat · GitHub and treat title as a reserved word, others don’t change the schema in any way: taifun/taifun/taifun.py at main · mxab/taifun · GitHub, yet others seem to add descriptions but don’t remove title fields: instructor/instructor/function_calls.py at main · jxnl/instructor · GitHub

1 Like

You have a question that is probably a perceived part of a larger application goal.

If you are talking about schema for function that can validate, then no, it can’t handle checking or creating:

"string_w_enum": {"type": "string", "enum": ["Happy", "Sad"]},

I don’t know where you’d get a schema “for free” to then want to process it. An object that came from another function or API?

I just happen to have an environment with pydantic models for asking them why they are useful…

Summary

print(json.dumps(file_obj.model_json_schema(), indent=2))
                    
{
  "additionalProperties": true,
  "properties": {
    "id": {
      "title": "Id",
      "type": "string"
    },
    "bytes": {
      "title": "Bytes",
      "type": "integer"
    },
    "created_at": {
      "title": "Created At",
      "type": "integer"
    },
    "filename": {
      "title": "Filename",
      "type": "string"
    },
    "object": {
      "const": "file",
      "title": "Object"
    },
    "purpose": {
      "enum": [
        "fine-tune",
        "fine-tune-results",
        "assistants",
        "assistants_output"
      ],
      "title": "Purpose",
      "type": "string"
    },
    "status": {
      "enum": [
        "uploaded",
        "processed",
        "error"
      ],
      "title": "Status",
      "type": "string"
    },
    "status_details": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Status Details"
    }
  },
  "required": [
    "id",
    "bytes",
    "created_at",
    "filename",
    "object",
    "purpose",
    "status"
  ],
  "title": "FileObject",
  "type": "object"
}

I don’t quite understand what you are asking me about - I don’t validate the schemas - but I want something that would be accepted by the completions call. The plan is to have functions that consume a pydantic object - and then generate the pydantic object schema and put it into the functions/tools parameter to the completions call.

I have written a library that generates this tools definition from type annotations - but nobody uses type annotations - so pydantic is probably a better choice and I want to add it there. And also my code does not handle deeper structures. Here is my current code: GitHub - zby/ToolDefGenerator: Generates a structure suitable for the tools argument in a client.chat.completions.create call

Have a look at this code in Langroid — we allow specification of a tool/ function as a Pydantic class along with associated helper methods and convert them into the appropriate format needed to send to an OpenAI API as a function specification. We also do the reverse - match the LLM JSON output to a Pydantic class and trigger the corresponding handler.

(Langroid is an Agent-oriented LLM programming Python framework from ex-CMU/UW-Madison researchers).

You can see an example of Pydantic-based function/tool in action in this colab quick start that builds up to a 2-agent system where one agent gathers structured info from a document by sending questions to another RAG-enabled agent:

Thanks!

That was interesting unfortunately it does not cover the recursive cases.

Here is an example.
I’ve added the plain pydantic classes (with 1 suffixes) - you can test that they do produce the needed schemas.

import json
from langroid.agent.tool_message import ToolMessage
from typing import List, Optional, Type
from pydantic import BaseModel
from pprint import pprint



class Foo(ToolMessage):
    request = ''
    purpose = ''
    count: int
    size: Optional[float] = None


class Bar(ToolMessage):
    request = ''
    purpose = ''
    apple: str = 'x'
    banana: str = 'y'


class Spam(ToolMessage):
    request = ''
    purpose = ''
    foo: Foo
    bars: List[Bar]

class Foo1(BaseModel):
    count: int
    size: Optional[float] = None


class Bar1(BaseModel):
    apple: str = 'x'
    banana: str = 'y'


class Spam1(BaseModel):
    foo: Foo1
    bars: List[Bar1]


llmschema = Spam.llm_function_schema(request=True)
definitions = llmschema.parameters.get('definitions').get('Foo')
pprint(definitions)
for i in definitions:
    print(i)
    print(definitions.get(i))
    print(json.dumps(definitions.get(i), indent=4))
    print('---------------------')

The output of that is:

/home/zby/gpt/langroid/venv/bin/python /home/zby/gpt/langroid/schema_example.py 
{'description': 'Abstract Class for a class that defines the structure of a '
                '"Tool" message from an\n'
                'LLM. Depending on context, "tools" are also referred to as '
                '"plugins",\n'
                'or "function calls" (in the context of OpenAI LLMs).\n'
                'Essentially, they are a way for the LLM to express its intent '
                'to run a special\n'
                'function or method. Currently these "tools" are handled by '
                'methods of the\n'
                'agent.\n'
                '\n'
                'Attributes:\n'
                '    request (str): name of agent method to map to.\n'
                '    purpose (str): purpose of agent method, expressed in '
                'general terms.\n'
                '        (This is used when auto-generating the tool '
                'instruction to the LLM)\n'
                '    result (str): example of result of agent method.',
 'exclude': {'result', 'purpose'},
 'properties': {'count': {'type': 'integer'},
                'purpose': {'default': '', 'type': 'string'},
                'request': {'default': '', 'type': 'string'},
                'result': {'default': '', 'type': 'string'},
                'size': {'type': 'number'}},
 'required': ['count'],
 'type': 'object'}
description
Abstract Class for a class that defines the structure of a "Tool" message from an
LLM. Depending on context, "tools" are also referred to as "plugins",
or "function calls" (in the context of OpenAI LLMs).
Essentially, they are a way for the LLM to express its intent to run a special
function or method. Currently these "tools" are handled by methods of the
agent.

Attributes:
    request (str): name of agent method to map to.
    purpose (str): purpose of agent method, expressed in general terms.
        (This is used when auto-generating the tool instruction to the LLM)
    result (str): example of result of agent method.
"Abstract Class for a class that defines the structure of a \"Tool\" message from an\nLLM. Depending on context, \"tools\" are also referred to as \"plugins\",\nor \"function calls\" (in the context of OpenAI LLMs).\nEssentially, they are a way for the LLM to express its intent to run a special\nfunction or method. Currently these \"tools\" are handled by methods of the\nagent.\n\nAttributes:\n    request (str): name of agent method to map to.\n    purpose (str): purpose of agent method, expressed in general terms.\n        (This is used when auto-generating the tool instruction to the LLM)\n    result (str): example of result of agent method."
---------------------
type
object
"object"
---------------------
properties
{'request': {'default': '', 'type': 'string'}, 'purpose': {'default': '', 'type': 'string'}, 'result': {'default': '', 'type': 'string'}, 'count': {'type': 'integer'}, 'size': {'type': 'number'}}
{
    "request": {
        "default": "",
        "type": "string"
    },
    "purpose": {
        "default": "",
        "type": "string"
    },
    "result": {
        "default": "",
        "type": "string"
    },
    "count": {
        "type": "integer"
    },
    "size": {
        "type": "number"
    }
}
---------------------
required
['count']
[
    "count"
]
---------------------
exclude
{'result', 'purpose'}
Traceback (most recent call last):
  File "/home/zby/gpt/langroid/schema_example.py", line 48, in <module>
    print(json.dumps(definitions.get(i), indent=4))
  File "/usr/lib/python3.10/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/usr/lib/python3.10/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/usr/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type set is not JSON serializable

Process finished with exit code 1

The immediate problem is that excludes is a set and is not serializable by json - but the underlying problem is that excludes is not removed in the definitions.

The recursive case is exactly the hard case that I would like to have covered by a library.

ToolMessage itself is not recursive. The way you would work with a nested Pydantic structure is define your nested structure purely in Pydantic via BaseModel and then stick the nested class name as one of the params in a ToolMessage. In your case you would have a ToolMessage named Spam with a field called “”myspam” of type Spam1.

In fact the nested case is exactly what I illustrate in the Colab.

Thanks for that clarification!
I still don’t like _recursive_purge_dict_keys - but I guess I need to live with this.