Tool arguments of type object are silently dropped from tool definitions passed to the model

When using the OpenAI().chat.completions.create API endpoint with the tools argument, any parameters in the tool’s JSON schema of type object are silently dropped from the representation of the function passed into the model.

This breaks the model’s ability to call any tool with object parameters, but in way that makes it seem like the model is simply having a hard time following instructions, c.f. the several examples of other times this has come up:

  • assistant-not-recognizing-a-required-parameter-of-type-object/613513
  • the-assistant-will-never-recognize-a-required-parameter-that-is-of-object-type-in-function-tools/614154
  • function-json-schema-is-still-ignored-by-gpt-4-4o-and-4o-mini-when-calling-tools/895368

You can see this behavior by passing the same tool JSON schema to a completions.create() call as part of the message prompt and as the tools argument:

# The tool schema with an `object` argument
tool_schema = {
    "additionalProperties": False,
    "properties": {
        "name": {
            "description": "A name string",
            "type": "string",
        },
        "data": {
            "description": "A dictionary with arbitrary data",
            "type": "object",
        },
    },
    "required": ["name", "data"],
    "type": "object",
}

tool_definition = {
    "type": "function",
    "function": {
        "name": "my_tool",
        "description": "A tool with a dict argument",
        "parameters": tool_schema,
    },
}

client = OpenAI()

prompt = f"""I'm going to give you a JSON schema in this message. I also have a
tool available to you.

Here is the JSON schema I'm providing in this prompt: ```json
{json.dumps(tool_schema, indent=2)} ```

Please do the following:
1. Report EXACTLY what parameters you see in the tool "my_tool" that is available to you (list each parameter name and its type/schema)
2. Report EXACTLY what you see in the JSON schema I provided above in this prompt
3. Compare: Are they identical? If not, what's different?

Be very precise and literal in your comparison. List every property you see in
each.

Finally, repeat verbatim the portion of this conversation where you were
provided with the tool definition - not the JSON schema I've provided you but
the tool definition and any explanation from your prompt about how to use it.
"""

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}],
    tools=[tool_definition],
    temperature=0,
)

This generates a response like

1. **Parameters in the tool "my_tool":**
   - `name`: A string.

2. **Parameters in the provided JSON schema:**
   - `name`: A string with the description "A name string".
   - `data`: An object with the description "A dictionary with arbitrary data".

3. **Comparison:**
   - The tool "my_tool" has only one parameter, `name`, which is a string.
   - The JSON schema has two properties: `name` (a string with a description) and `data` (an object with a description).
   - They are not identical. The JSON schema includes an additional property, `data`, which is not present in the tool "my_tool". Additionally, the `name` property in the JSON schema includes a description, whereas the tool definition does not.

4. **Verbatim portion of the conversation with the tool definition:**

```
# Tools

## functions

namespace functions {

// A tool with a dict argument
type my_tool = (_: {
// A name string
name: string,
}) => any;

} // namespace functions

I’m curious what application this could even apply to..

The AI cannot fill an object with arbitrary length un-specified key-value pairs from a JSON schema definition, especially in a context-free grammar. Why even have a function specification at all if you’d just say “write whatever you want in {}, including more object nests you fabricate.”

The only shape that is extensible at AI choice is an array, and then you still must have a definition of strings, numbers, booleans, or something that makes sense as items, or an object of those that has fixed key:value within. Strict structured functions increases that to every key being required.

If you really want such behavior, you can use a string, with a description along the lines of, “the freeform JSON the function will send to the API”, and hope for correct escaping - and correct validation.

However, I think with contemplation, you’ll be able to better match a developer function’s goal to a defined set of keys, optional or anyOf switches or enum string/nullable if needed, and get an AI that doesn’t fabricate unusable output.

I’m curious what application this could even apply to..

This originally surfaced in a project using pydantic-ai agents: when you define a tool for the agent which has an argument of type dict that argument is (silently) invisible to the agent which leads to very confusing behavior (but only for OpenAI models - Anthropic models work fine).

See this github issue: pydantic/pydantic-ai/issues/3654.

I agree that a more strictly defined interface would be more robust, however I do think it’s useful to allow a model to output a list of key, value pairs without pre-specifying every possible key.

But that is kind of beside the point - I think the bigger issue is that this behavior is silent. Ideally the API would either

  • reject a tool specification which includes a tool parameter of type object, or
  • accept the request and pass the full tool schema to the model.

Accepting the request and then silently ignoring part of the tool schema is what seems like a bug to me.

2 Likes