Pydantic with Dict not working

Trying to get response like this:
{“1”: “first”, “2”: “second”, …}
but using dict somehow doesn’t work. Am I doing something wrong?

class MyObject(BaseModel):
    my_values: Dict[str, str]

response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt},
    ],
    response_format= MyObject,
)

Error code: 400 - {‘error’: {‘message’: “Invalid schema for response_format ‘MyObject’: In context=(), ‘required’ is required to be supplied and to be an array including every key in properties. Extra required key ‘my_values’ supplied.”, ‘type’: ‘invalid_request_error’, ‘param’: ‘response_format’, ‘code’: None}}

1 Like

You can’t just make up your own keys for the AI to produce, or leave it open-ended to get the AI to produce multiple key/fields.

Strict means that only the named keys and structure passed can be produced, with all key values deliberately “required”. Pydantic usage can only produce a strict validation, where the keys of the schema must match the AI generation.

The only approximation is an array, without the numbered indexing. There’s little point in parsing, since the exact count can’t be mapped, so we just write up a schema:

import json
from openai import Client
client = Client()

response_format_object = {
    "type": "json_schema",
    "json_schema": {
        "name": "array_response",
        "strict": True,
        "schema": {
            "type": "object",
            "properties": {
              "items": {
                "type": "array",
                "description": "Produce a multi-item response in array",
                "items": { "type": "string" }
              }
            },
            "required": ["items"],
            "additionalProperties": False
        }
    }
}

response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "List four good xmas presents for mom."}
    ],
    response_format=response_format_object
)
content = response.choices[0].message.content
print(json.dumps(json.loads(content), indent=2))

Yielding a prettied response:

{
  "items": [
    "A luxurious cashmere scarf to keep her warm during the winter months.",
    "A personalized photo album filled with cherished family memories.",
    "A high-quality aromatic candle with her favorite scent.",
    "An indulgent spa gift set for a relaxing at-home spa experience."
  ]
}

If not passing a BaseModel, there is no “parsed” created. The OpenAI library beta method doesn’t understand anything but Pydantic response_format.

1 Like

I thought I’d show something interesting:

  • strict: false is barely better than just instructing the AI about JSON, which you can do more verbosely;
  • a non-strict schema for response_format is placed for the AI understanding the same way, a mere injection of what was provided;
  • the way schemas are placed makes us think that OpenAI has private or future use of multiple response output types for their own model usage.

So: I replicate, trick, and expand on the system message placement of a schema. Then see how it performs by the AI’s understanding and any proprietary training done on following schemas.

import json; from openai import Client; client = Client()

system_message = """You are a helpful assistant.

Prefer indexed JSON object output responses.

# Responses

## multi_item_response

{
    "type": "object",
    "title": "Multi-item Responses"
    "description": "JSON responses with multiple indexed items for any lists as output",
    "patternProperties": {
        "^[0-9]+$": {
            "type": "string"
        }
    },
    "additionalProperties": false
}

## single_item_response

{
    "type": "object",
    "title": "Single-item text responses"
    "description": "Produce a single-item or direct response to user",
    "properties": {
      "item": {
        "type": "string",
        "description": "The content of the response item."
      }
    },
    "required": [
      "item"
    ],
    "additionalProperties": false
  }"""

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": "List four good xmas presents for mom."}
    ],
    response_format={"type": "json_object"}
)
content = response.choices[0].message.content
try:
    print(json.dumps(json.loads(content), indent=2))
except:
    print(f"The response failed JSON-parsing. Response:\n{content}")

In my system message, you see how the schemas are just plonked there with no guidance. That’s the way they are injected, but they are also minified.

Response:

{
  "1": "A personalized piece of jewelry, like a necklace with her initials or birthstones.",
  "2": "A spa day gift certificate for some relaxation and pampering.",
  "3": "A high-quality scented candle or a set of aromatherapy oils.",
  "4": "A custom photo album or framed family photo to cherish memories."
}

What have I done? I’ve not only placed optional schemas right into the system message instead of convoluted and nested anyOf constructions, but I also gave the AI an open-ended schema with unsupported keyword patternProperties and no keys.

“JSON mode” is used so the AI doesn’t try to wrap in markdown or other non-json output.

That output is formatted by JSON library that would fail if not valid.

Thus, achievement unlocked.

The only difference is that with json_object to activate enforced structured output, the AI for some reason must output the name of the second level heading, even though you have no options and the “strict” AI cannot deviate from this. This heading is the “json_schema”->“name” that is not actually the response schema followed. 10 token schema name = 10 wasted input and output tokens when emitting to the internal response recipient of the API backend. This output behavior is not trained or demonstrated in simulation.