Support top-level array in JSON schema

I always want the GPT API to be able to return a JSON array instead of an object. There are many normal use cases where I expect the LLM to extract a list of things.

I tried the latest Structured Output feature and was disappointed this is still not supported:

openai.BadRequestError: Error code: 400 - {'error': {'message': 'Invalid schema for response_format \'PolicyStatements\': schema must be a JSON Schema of \'type: "object"\', got \'type: "array"\'.', 'type': 'invalid_request_error', 'param': 'response_format', 'code': None}}

I know I can always put the array into a single-key object, but it’s just so annoying I also have to modify the prompts to accomodate this.

Is there any practical reason not to support this?

3 Likes

I opened a feature request in the Python SDK repository yesterday, that deals with exactly the same issue: github[.]com/openai/openai-python/issues/2090#issuecomment-2636377784 (sorry I cannot post links yet)
Since I was told, that this is actually a limitation of the API, I was redirected to the forums here and want to repost my experience when using the Python SDK instead of the HTTP API directly:

I tried the two following approaches and both gave an “TypeError: Unsupported response_format type” exception.
Using plain lists:

completion = await client.beta.chat.completions.parse(
            model=model,
            messages=messages,
            response_format=list[TestCase]
      )

Using TypeAdapters, which Pydantic recommends for top level lists: docs.pydantic[.]dev/2.10/concepts/type_adapter/

completion = await client.beta.chat.completions.parse(
            model=model,
            messages=messages,
            response_format=TypeAdapter(list[TestCase])
      )

The only workaround I found was to wrap the list in a temporary Model object, and extract the list afterwards:

class TempModel(BaseModel):
    test_cases: list[TestCase]

completion = await client.beta.chat.completions.parse(
            model=model,
            messages=messages,
            response_format=TempModel
      )

test_case_list = completion.choices[0].message.parsed.test_cases

While this works, it seems overly complicated to me.
IMO a TypeAdapter should be accepted as well.