Openai.pydantic_function_tool() Vs function_to_schema()

In the function-calling context, I use function_to_schema(func) to convert a Python function into the JSON schema needed for the API.
You can see the definition here: build-hours/2-assistants/demo_util.py at main · openai/build-hours · GitHub and here: openai-cookbook/examples/Orchestrating_agents.ipynb at main · openai/openai-cookbook · GitHub

function_tools = [function_to_schema(f) for f in functions] if functions else []

The problem is that it seems OpenAI now uses a different method: openai.pydantic_function_tool().
You can find its source here: openai-python/src/openai/lib/_tools.py at b95be16e7c8a76c3d63335df13ab0d55ba3d5c35 · openai/openai-python · GitHub.

For an example of OpenAI use: openai-python/helpers.md at main · openai/openai-python · GitHub.

Interestingly, this new approach takes a class as a parameter instead of a function. While it’s possible to describe a function using a class (with class members understood as parameters), this approach feels unusual.

Some developers also duplicate the function they want to call into a class so it can be used as a parameter for pydantic_function_tool()see https://www.datacamp.com/tutorial/open-ai-structured-outputs.

Any thoughts on why OpenAI made this move?

1 Like

I didn’t really bother to look into it but I imagine that the first is applicable only for function calling, while a schema/class is applicable for both (function calling & json schemas)

This is part of the beta SDK method for passing a Pydantic BaseModel class object into the SDK, instead a streamable Python data object, and having it create a validation schema.

This does the work of adding everything required by “strict” for you. No forgetting a closing brace or an additionalProperties: false. Your class can be used to validate the AI generation. You can have just one definition used throughout code.

When a BaseModel is used with response_format, the beta SDK described in helpers.md adds a parsed element to the chat completions user response, that validates and does additional work in making the values directly accessible by methods when you use .parse().

I personally “just say no”, maximizing portability across vendors.

strict can be easily added:

	def function_to_schema(func) -> dict:
		# https://github.com/openai/build-hours/blob/main/2-assistants/demo_util.py
		# https://github.com/openai/openai-cookbook/blob/main/examples/Orchestrating_agents.ipynb
		type_map = {
			str: "string",
			int: "integer",
			float: "number",
			bool: "boolean",
			list: "array",
			dict: "object",
			type(None): "null",
		}

		try:
			signature = inspect.signature(func)
		except ValueError as e:
			raise ValueError(
				f"Failed to get signature for function {func.__name__}: {str(e)}"
			) from e

		parameters = {}
		for param in signature.parameters.values():
			try:
				param_type = type_map.get(param.annotation, "string")
			except KeyError as e:
				raise KeyError(
					f"Unknown type annotation {param.annotation} for parameter {param.name}: {str(e)}"
				) from e
			if param.name != "self":
				# parameters[param.name] = {"type": param_type, "description":""}
				parameters[param.name] = {"type": param_type}

		required = [
			param.name
			for param in signature.parameters.values()
			if param.default == inspect.Parameter.empty and param.name != "self"
		]
		# end for

		return {
			"type": "function",
			"function": {
				"name": func.__name__,
				"strict": True,
				"description": (func.__doc__ or "").strip(),
				"parameters": {
					"properties": parameters,
					"required": required,
					"type": "object",
					"additionalProperties": False
				},
			},
		}

But why did they choose to use a class instead a function as the parameter of pydantic_function_tool()?

The input is not just any old class, where a class has attributes and can be built to fit a need. It is a Pydantic BaseModel. Pydantic already has lots of support for schemas and validation. It is built for structured data, possibly containing many models, while a general function, or method, is more oriented for the doing. The SDK is specifically built to take Pydantic. Basically a switch - is input dict or BaseModel. It is just more realized when you look at how response_format is utilized.

Pydantic’s BaseModel provides several built-in methods that you can use to work with your data models. Here are a few key ones:

Parsing and Validation:

  • model_validate(): Parses and validates data from a dictionary or object.
  • model_validate_json(): Parses and validates data from a JSON string or bytes object. [1]
  • model_validate_strings(): Parses and validates data from a dictionary with string keys and values in JSON mode.

Serialization:

  • dict() or model_dump(): Converts the model instance into a dictionary.
  • json() or model_dump_json(): Serializes the model instance into a JSON string.

Other Useful Methods:

  • copy(): Creates a deep copy of the model instance.
  • eq(): Checks if two model instances are equal.
  • repr(): Returns a string representation of the model instance.
  • fields: Returns a dictionary of the model’s fields.

Adding Custom Methods:

You can also add your own custom methods to a BaseModel class just like any other Python class. This allows you to add business logic, data manipulation, or any other functionality that you need.

from pydantic import BaseModel
class User(BaseModel):
    name: str
    age: int
    def greet(self):
        return f"Hello, {self.name}!"user = User(name="Alice", age=30)print(user.greet())  # Output: Hello, Alice!

[1] pydantic/docs/concepts/models.md at main · pydantic/pydantic · GitHub

Maybe you can go on GitHub and start asking the “why” of the SDK at all. It’s becoming very heavy, when all a language model developer really wants is to send their JSON to the RESTful AI API. (or an API/SDK built on sending a more efficient standard, e.g. Google).