Structured Outputs with Batch Processing

Hi,

Hopefully this is me doing something wrong which can be easily fixed and not a bug…

I’ve successfully run the structured outputs using the client.beta.chat.completions.parse() method but when I’ve tried to do the same in batch processing I am getting errors or missing keys.

{
  "id": "batch_req_xxx",
  "custom_id": "request-0",
  "response": {
    "status_code": 400,
    "request_id": "xxx",
    "body": {
      "error": {
        "message": "Invalid value: 'object'. Supported values are: 'json_object', 'json_schema', and 'text'.",
        "type": "invalid_request_error",
        "param": "response_format.type",
        "code": "invalid_value"
      }
    }
  },
  "error": null
}

this is the response_format I’m using on both API call and batch file:

{
  "properties": {
    "optimised_title": {
      "title": "Optimised Title",
      "type": "string"
    },
    "meta_description": {
      "title": "Meta Description",
      "type": "string"
    }
  },
  "required": [
    "optimised_title",
    "meta_description"
  ],
  "title": "DynamicSchema",
  "type": "json_schema"
}

The other errors I’m getting is we expected an object but got a string followed by we expected a string and got an object` after I change it.

I’m be wresting with this for the past 48 hours can anyone help me?

@slippy Hi! I think you have incorrect syntax in your JSON schema, i.e. you are using “title” as opposed to “description”? So it should actually look like this:

{
  "properties": {
    "optimised_title": {
      "description": "Optimised Title",
      "type": "string"
    },
    "meta_description": {
      "description": "Meta Description",
      "type": "string"
    }
  },
  "required": [
    "optimised_title",
    "meta_description"
  ],
  "description": "DynamicSchema",
  "type": "json_schema"
}
2 Likes

Hey @platypus - thanks for the reply.

I’ve tried that but still not working, I’ve actually copied the example given here and it’s still not working:

{
  "custom_id": "request-1",
  "method": "POST",
  "url": "/v1/chat/completions",
  "body": {
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful math tutor. Guide the user through the solution step by step."
      },
      {
        "role": "user",
        "content": "how can I solve 8x + 7 = -23"
      }
    ],
    "max_tokens": 4096,
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "math_response",
        "schema": {
          "type": "object",
          "properties": {
            "steps": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "explanation": {
                    "type": "string"
                  },
                  "output": {
                    "type": "string"
                  }
                },
                "required": [
                  "explanation",
                  "output"
                ],
                "additionalProperties": false
              }
            },
            "final_answer": {
              "type": "string"
            }
          },
          "required": [
            "steps",
            "final_answer"
          ],
          "additionalProperties": false
        },
        "strict": true
      }
    }
  }
}

this particular attempt, I don’t get an error file, just a ‘failed’ and it reads the file as empty - going into the file storage (in the OpenAI dashboard) and downloading the uploaded file, I see the above - so this code is confirmed to be uploading.

Appreciate any further guidance

1 Like

@slippy I tried your example above using Batch API and it worked fine. It might be super silly question, but in your .jsonl you don’t have any indentation right?

Anyway, here are the exact steps I took :blush:

Step 1: I created a batchinput.jsonl with the following contents (NOTE: single line, no indentations, whitespaces are OK by JSON standard)

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."}, {"role": "user", "content": "how can I solve 8x + 7 = -23"}], "max_tokens": 4096, "response_format": {"type": "json_schema", "json_schema": {"name": "math_response", "schema": {"type": "object", "properties": {"steps": {"type": "array", "items": {"type": "object", "properties": {"explanation": {"type": "string"}, "output": {"type": "string"}}, "required": ["explanation", "output"], "additionalProperties": false}}, "final_answer": {"type": "string"}}, "required": ["steps", "final_answer"], "additionalProperties": false}, "strict": true}}}}

Step 2: I uploaded batchinput.jsonl via Files API. I received a file ID in the response; in this case it is file-F2ieFWin68wvubNmBPOUvsDW.

curl https://api.openai.com/v1/files \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F purpose="batch" \
  -F file="@batchinput.jsonl"

Step 3: I created a batch using the above file ID. I received a batch ID in the response; in this case it was batch_asVAZzeehZ4mf2QE9zi1krvE

curl https://api.openai.com/v1/batches \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input_file_id": "file-F2ieFWin68wvubNmBPOUvsDW",
    "endpoint": "/v1/chat/completions",
    "completion_window": "24h"
  }'

Step 4: I queried the batch status using the above batch ID. When I saw that the status was set to completed, I noted the output_file_id, in this case it is file-8E8AlSiy5puk3vBkthQ8UNlw.

curl https://api.openai.com/v1/batches/batch_asVAZzeehZ4mf2QE9zi1krvE \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json"

Step 5: Retrieve the final output using the above output file ID.

curl https://api.openai.com/v1/files/file-8E8AlSiy5puk3vBkthQ8UNlw/content \
  -H "Authorization: Bearer $OPENAI_API_KEY" > batch_output.jsonl

The final output (as per batch_output.jsonl) is as follows:

{"id": "batch_req_3aNHqQt8UB1idP6RkXetGJgM", "custom_id": "request-1", "response": {"status_code": 200, "request_id": "a7ba2b98e25ff47189d7550eee2d8072", "body": {"id": "chatcmpl-9xc57s5a7ngga92riZU0wpcIdhEG3", "object": "chat.completion", "created": 1723994765, "model": "gpt-4o-mini-2024-07-18", "choices": [{"index": 0, "message": {"role": "assistant", "content": "{\"steps\":[{\"explanation\":\"First, we will isolate the term with 'x' by moving the constant term to the other side of the equation. We can do this by subtracting 7 from both sides.\",\"output\":\"8x + 7 - 7 = -23 - 7\"},{\"explanation\":\"This simplifies to 8x = -30.\",\"output\":\"8x = -30\"},{\"explanation\":\"Next, we will isolate 'x' by dividing both sides of the equation by 8.\",\"output\":\"x = -30 / 8\"},{\"explanation\":\"Now we can simplify -30 / 8. We can divide both the numerator and the denominator by 2.\",\"output\":\"x = -15 / 4\"},{\"explanation\":\"Finally, we can write -15 / 4 in decimal form if necessary. -15 / 4 = -3.75.\",\"output\":\"x = -3.75\"}],\"final_answer\":\"x = -15/4 or x = -3.75\"}", "refusal": null}, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 45, "completion_tokens": 207, "total_tokens": 252}, "system_fingerprint": "fp_507c9469a1"}}, "error": null}
9 Likes

Hmm. The only think I’m doing is added a \n character to put each on a new line but when I’m not adding the response_format section it works fine, unless you mean something else?

1 Like

@slippy if you follow my steps above exactly, and copy-paste exactly how I printed (including the jsonl file), without any additional formatting (but be careful to replace with your own file ID, batch ID and output ID) - does it work?

Because I didn’t do anything special - I just copy-pasted your schema, and followed the steps as per Batch API documentation, and it worked.

I tried yours and it worked. I think ran mine again for comparison and it also worked. So I’m not sure what’s happened to be honest.

Regardless, I credit you for the fix so I just wanted to say I really appreciate you having a look and taking the time.

4 Likes

Glad it works, and always happy to help @slippy !

1 Like

Solved my issue as well! Thank you Playtypus

1 Like

If anyone else facing issue with using structured outputs for batch api, i have found this works the easiest.

Define your schema using Pydantic , then you can convert it to OAI compliant strict json schema using to_strict_json_schema function available in the python client. (though a private method and ugly, this was much easier and more reliable than other 3rd party libraries that have not been tested extensively).

from openai.lib._pydantic import to_strict_json_schema
6 Likes

Nice, thanks for the tip!

2 Likes

Thank you so much it saved me a lot of time

1 Like

using response_format makes my batch to fail. Without this works. Is response_format still the name for the structured outputs in completions endpoint ?

Here is the reason I’m getting for the failure.
{"error": {"message": "Unknown parameter: 'response_format.name'.", "type": "invalid_request_error", "param": "response_format.name", "code": "unknown_parameter"}}

For my reference and for anyone having issues with the structured outputs using completion endpoint.

response_format is the right argument.

Following json schema worked finally,

    "type": "json_schema",
    "json_schema": {
        "name": "instruction_response_list",
        "strict": True,
        "schema": {  # ← wrap everything from here down
            "type": "object",
            "properties": {
                "items": {
                    "type": "array",
                    "description": "A list containing 15 JSON objects, each with an instruction and a response.",
                    "minItems": 15,
                    "maxItems": 15,
                    "items": {
                        "type": "object",
                        "properties": {
                            "instruction": {
                                "type": "string",
                                "description": "A prompt or instruction."
                            },
                            "response": {
                                "type": "string",
                                "description": "A response to the corresponding instruction."
                            }
                        },
                        "required": ["instruction", "response"],
                        "additionalProperties": False
                    }
                }
            },
            "required": ["items"],
            "additionalProperties": False
        }
    }
}

Hi, I’ve been working on this (using structured model outputs on the batch API) and I think I’ve succeeded. I would like to share my findings in case could help someone else. The following code is based on the official example.

from openai import OpenAI
from pydantic import BaseModel

client = OpenAI()

class Step(BaseModel):
    explanation: str
    output: str

class MathReasoning(BaseModel):
    steps: list[Step]
    final_answer: str

So, the trick to replicate the “parse” behavior depends on the API primitive you are using, I leave examples for both:

In the first place you can use the following snippet to manually imitate a call to the parse method.

# https://api.openai.com/v1/chat/completions

from openai.lib._parsing import type_to_response_format_param, parse_chat_completion

response = client.chat.completions.create(
    model="gpt-5-nano-2025-08-07",
    messages=[
        {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
        {"role": "user", "content": "how can I solve 8x + 7 = -23"}
    ],
    response_format=type_to_response_format_param(MathReasoning),
)

parsed_response = parse_chat_completion(
    chat_completion=response,
    response_format=MathReasoning,
    input_tools=[]
)

parsed_response.choices[0].message.parsed
# https://api.openai.com/v1/responses

from openai.lib._parsing._responses import type_to_text_format_param, parse_response

response = client.responses.create(
    model="gpt-5-nano-2025-08-07",
    input=[
        {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
        {"role": "user", "content": "how can I solve 8x + 7 = -23"}
    ],
    text={"format": type_to_text_format_param(MathReasoning)},
)

parsed_respose = parse_response(
    response=response,
    text_format=MathReasoning,
    input_tools=[]
)

parsed_respose.output_parsed

Now that we know this, making the call to the batch API is easy:

# https://api.openai.com/v1/chat/completions

from openai.lib._parsing import type_to_response_format_param, 

records = [
    {
        "custom_id": "request-1",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-5-nano-2025-08-07",
            "messages": [
                {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
                {"role": "user", "content": "how can I solve 8x + 7 = -23"}
            ],
            "response_format": type_to_response_format_param(MathReasoning)
        }
    },
    {
        "custom_id": "request-2",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-5-nano-2025-08-07",
            "messages": [
                {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
                {"role": "user", "content": "how can I solve 8x = 8"}
            ],
            "response_format": type_to_response_format_param(MathReasoning)
        }
    }
]

with open("/tmp/completions_batch_input.jsonl", "w") as f:
    for record in records:
        f.write(json.dumps(record) + "\n")
....

client.batches.create(
    input_file_id="file-example-input-file-id",
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
        "description": "example structured over completions"
    }
)

....

from openai.types.chat.chat_completion import ChatCompletion
from openai.lib._parsing import parse_chat_completion
import json

file_response = client.files.content("file-output-example")

completions = [
    parse_chat_completion(
        chat_completion=ChatCompletion\
            .model_validate(json.loads(line)['response']['body']),
        response_format=MathReasoning,
        input_tools=[]
    )\
    .choices[0]\
    .message\
    .parsed
    for line in file_response.read().splitlines() if line.strip()
]

completions
# https://api.openai.com/v1/responses

from openai.lib._parsing._responses import type_to_text_format_param

records = [
    {
        "custom_id": "request-1",
        "method": "POST",
        "url": "/v1/responses",
        "body": {
            "model": "gpt-5-nano-2025-08-07",
            "input": [
                {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
                {"role": "user", "content": "how can I solve 8x + 7 = -23"}
            ],
            "text": {"format": type_to_text_format_param(MathReasoning)}
        }
    },
    {
        "custom_id": "request-2",
        "method": "POST",
        "url": "/v1/responses",
        "body": {
            "model": "gpt-5-nano-2025-08-07",
            "input": [
                {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
                {"role": "user", "content": "how can I solve 8x = 8"}
            ],
            "text": {"format": type_to_text_format_param(MathReasoning)}
        }
    }
]

with open("/tmp/responses_batch_input.jsonl", "w") as f:
    for record in records:
        f.write(json.dumps(record) + "\n")

....

client.batches.create(
    input_file_id="file-example-input-file-id",
    endpoint="/v1/responses",
    completion_window="24h",
    metadata={
        "description": "example structured over responses"
    }
)

....

from openai.types.responses import Response
from openai.lib._parsing._responses import parse_response
import json

file_response = client.files.content("file-output-example")

completions = [
    parse_response(
        response=Response\
            .model_validate(json.loads(line)['response']['body']),
        text_format=MathReasoning,
        input_tools=[]
    )\
    .output_parsed\
    for line in file_response.read().splitlines() if line.strip()
]

completions
1 Like