Structured outputs doesn't work

the.brainiac · January 17, 2025, 6:11am

I’m trying to generate a question:

from typing import Optional, Union, List, Dict, Any
from pydantic import BaseModel
from openai import OpenAI

class Table(BaseModel):
    title: str
    columns: List[str] 
    data: List[Dict[str,int]]


class Passage(BaseModel):
    text: str
    table: Table

class Question(BaseModel):
    passage: Passage
    question: str 

openai_client = OpenAI(api_key=userdata.get("OPENAI_API_KEY"))

max_tokens =  8000

response = openai_client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[{'role':'user','content': 'Output a question'}],
    max_completion_tokens=max_tokens,
    response_format= Question
)
response.choices[0].message.parsed

But running into the following error:

BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'Question': In context=(), 'required' is required to be supplied and to be an array including every key in properties. Extra required key 'data' supplied.", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}

I’m on python 3.11, why isn’t it working?

platypus · January 17, 2025, 1:55pm

Hi @the.brainiac !

It seems to have an issue with this:

data: List[Dict[str,int]].

Not sure if this is allowed, it’s not really adhering to a strict schema, since it would allow for ambiguous key-value pairs. Can you try removing this one from Table to see if it works?

the.brainiac · January 17, 2025, 4:26pm

Hi @platypus ,
Thanks for your response.

I could remove it, but I absolutely need the data field, the list of dictionaries in the the data field contain the data that will populate the table.
And the thing is the keys aren’t fixed for this list (for e.g. it can be [{‘year’:2002},{‘pop’:1000}] for a question containing a table showing the population graph).

I was wondering if there was a way to make it output a list of dicts for data.

_j · January 17, 2025, 4:37pm

And thus a strict grammar cannot be built and Pydantic cannot be used.

The whole point is that the API can enforce the next key to be output by the AI in an object (the AI writes JSON, thus not a dict). Every key of an object also needs to be set within a “required” array of a JSON schema.

If you make your own JSON schema, place it in the instructive container for the API of “strict:false” along with its name, you can send it as an unenforced schema response_format that the AI doesn’t have to strictly follow, as it is then just an AI instruction.

platypus · January 17, 2025, 6:41pm

Yes so then you have to use the “raw” JSON schema as Jay mentioned and not use the strict mode. You will then have to also do some additional validation on your response because there are no guarantees that the schema will be followed at all, but generally it will be .

redwagonagency · January 17, 2025, 6:48pm

Thanks for posting this I was having a similar issue

Topic		Replies	Views
Pydantic with Dict not working Bugs gpt-4 , api	2	1613	December 8, 2024
Extra required key error in response format for JSON schema? API fine-tuning	5	3308	October 14, 2024
Structured Response: enums not supported in with Pydantic schema generation Bugs	13	2725	September 20, 2024
How to define pydantic/JSON schema API gpt-4o-mini , structured-output	5	15397	October 25, 2024
Limitations of response_format in Azure OpenAI Bugs response_format	0	600	October 17, 2024

Structured outputs doesn't work

Related topics