Structured outputs doesn't work

I’m trying to generate a question:

from typing import Optional, Union, List, Dict, Any
from pydantic import BaseModel
from openai import OpenAI

class Table(BaseModel):
    title: str
    columns: List[str] 
    data: List[Dict[str,int]]


class Passage(BaseModel):
    text: str
    table: Table

class Question(BaseModel):
    passage: Passage
    question: str 

openai_client = OpenAI(api_key=userdata.get("OPENAI_API_KEY"))

max_tokens =  8000

response = openai_client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[{'role':'user','content': 'Output a question'}],
    max_completion_tokens=max_tokens,
    response_format= Question
)
response.choices[0].message.parsed

But running into the following error:

BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'Question': In context=(), 'required' is required to be supplied and to be an array including every key in properties. Extra required key 'data' supplied.", 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}

I’m on python 3.11, why isn’t it working?

1 Like

Hi @the.brainiac !

It seems to have an issue with this:

data: List[Dict[str,int]].

Not sure if this is allowed, it’s not really adhering to a strict schema, since it would allow for ambiguous key-value pairs. Can you try removing this one from Table to see if it works?

1 Like

Hi @platypus ,
Thanks for your response.

I could remove it, but I absolutely need the data field, the list of dictionaries in the the data field contain the data that will populate the table.
And the thing is the keys aren’t fixed for this list (for e.g. it can be [{‘year’:2002},{‘pop’:1000}] for a question containing a table showing the population graph).

I was wondering if there was a way to make it output a list of dicts for data.

1 Like

And thus a strict grammar cannot be built and Pydantic cannot be used.

The whole point is that the API can enforce the next key to be output by the AI in an object (the AI writes JSON, thus not a dict). Every key of an object also needs to be set within a “required” array of a JSON schema.

If you make your own JSON schema, place it in the instructive container for the API of “strict:false” along with its name, you can send it as an unenforced schema response_format that the AI doesn’t have to strictly follow, as it is then just an AI instruction.

2 Likes

Yes so then you have to use the “raw” JSON schema as Jay mentioned and not use the strict mode. You will then have to also do some additional validation on your response because there are no guarantees that the schema will be followed at all, but generally it will be :sweat_smile:.

1 Like

Thanks for posting this I was having a similar issue

1 Like