Question about gpt-5 batch API

Does anyone know how to apply structured output when using batch API posting on \v1\responses, with gpt-5?

Welcome to the dev community @Hijkstuv

Here’s an example of a one line JSON for structured outputs over Responses API using the Batch API:

{"custom_id": "request001", "method": "POST", "url": "/v1/responses", "body": {  "model": "gpt-5",  "input": [  { "role": "system", "content": "You are an expert at structured data extraction. You will be given unstructured text from a research paper and should convert it into the given structure."  },  { "role": "user", "content": "..."  }  ],  "text": {  "format": { "type": "json_schema", "name": "research_paper_extraction", "schema": {"type": "object","properties": {"title": { "type": "string" },"authors": {  "type": "array",  "items": { "type": "string" }},"abstract": { "type": "string" },"keywords": {  "type": "array",  "items": { "type": "string" }}},"required": ["title", "authors", "abstract", "keywords"],"additionalProperties": false }, "strict": true  }  }}} 

This was put together simply by placing the body of the cURL request in the Structured output examples from docs into the body of a Batch API input request for the Responses API.

2 Likes

Here are some data employing the above line, prepared in Python:

import json
line = r'''that_line_copied'''
request_dict = json.loads(line)

pretty_json = json.dumps(request_dict, indent=2)
compact_line = json.dumps(request_dict, separators=(',', ':'))

print(f"{pretty_json}\n\n{compact_line}")

The request_dict.body is like you might already use in a normal request that serializes Python objects to a JSON-formatted string request body:
openai_client.responses.create(**request_dict["body"])

pretty_json is so we can see what is being communicated on the forum.
compact_line is a string to add to a JSONL, but without excess bytes.

{
  "custom_id": "request001",
  "method": "POST",
  "url": "/v1/responses",
  "body": {
    "model": "gpt-5",
    "input": [
      {
        "role": "system",
        "content": "You are an expert at structured data extraction. You will be given unstructured text from a research paper and should convert it into the given structure."
      },
      {
        "role": "user",
        "content": "..."
      }
    ],
    "text": {
      "format": {
        "type": "json_schema",
        "name": "research_paper_extraction",
        "schema": {
          "type": "object",
          "properties": {
            "title": {
              "type": "string"
            },
            "authors": {
              "type": "array",
              "items": {
                "type": "string"
              }
            },
            "abstract": {
              "type": "string"
            },
            "keywords": {
              "type": "array",
              "items": {
                "type": "string"
              }
            }
          },
          "required": [
            "title",
            "authors",
            "abstract",
            "keywords"
          ],
          "additionalProperties": false
        },
        "strict": true
      }
    }
  }
}
{"custom_id":"request001","method":"POST","url":"/v1/responses","body":{"model":"gpt-5","input":[{"role":"system","content":"You are an expert at structured data extraction. You will be given unstructured text from a research paper and should convert it into the given structure."},{"role":"user","content":"..."}],"text":{"format":{"type":"json_schema","name":"research_paper_extraction","schema":{"type":"object","properties":{"title":{"type":"string"},"authors":{"type":"array","items":{"type":"string"}},"abstract":{"type":"string"},"keywords":{"type":"array","items":{"type":"string"}}},"required":["title","authors","abstract","keywords"],"additionalProperties":false},"strict":true}}}}

To ensure success, I’m going to directly consume from the compact_line meant for a batch:

import os, httpx, json
try:
    with httpx.Client(timeout=1000) as client:
        resp = client.post(
            "https://api.openai.com/" + json.loads(compact_line).get('url'),
            headers={
                "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}",
                "Content-Type": "application/json",
                "Accept": "application/json",
            },
            content=json.dumps(json.loads(compact_line).get('body')),
        )
        resp.raise_for_status()
except httpx.HTTPStatusError as e:
    print(f"Request failed: {e}")
    if e.response is not None:
        try:
            # print body error messages from OpenAI
            print("Error response body:\n", e.response.text)
        except Exception:
            pass
    raise
except httpx.RequestError as e:
    print(f"Request error: {e}")
    raise

<Response [200 OK]>


We see below the text.format with "type": "json_schema" echoed back at you in the large response object (that comes pretty-formatted), along with an AI forced to use that schema even though there is nothing to “say” about the user’s ellipsis input.

print(resp.text)

{
  "id": "resp_1234",
  "object": "response",
  "created_at": 1760179057,
  "status": "completed",
  "background": false,
  "billing": {
    "payer": "openai"
  },
  "error": null,
  "incomplete_details": null,
  "instructions": null,
  "max_output_tokens": null,
  "max_tool_calls": null,
  "model": "gpt-5-2025-08-07",
  "output": [
    {
      "id": "rs_1234",
      "type": "reasoning",
      "summary": []
    },
    {
      "id": "msg_1234",
      "type": "message",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "annotations": [],
          "logprobs": [],
          "text": "{\"title\":\"\",\"authors\":[],\"abstract\":\"\",\"keywords\":[]}"
        }
      ],
      "role": "assistant"
    }
  ],
  "parallel_tool_calls": true,
  "previous_response_id": null,
  "prompt_cache_key": null,
  "reasoning": {
    "effort": "medium",
    "summary": null
  },
  "safety_identifier": null,
  "service_tier": "default",
  "store": true,
  "temperature": 1.0,
  "text": {
    "format": {
      "type": "json_schema",
      "description": null,
      "name": "research_paper_extraction",
      "schema": {
        "type": "object",
        "properties": {
          "title": {
            "type": "string"
          },
          "authors": {
            "type": "array",
            "items": {
              "type": "string"
            }
          },
          "abstract": {
            "type": "string"
          },
          "keywords": {
            "type": "array",
            "items": {
              "type": "string"
            }
          }
        },
        "required": [
          "title",
          "authors",
          "abstract",
          "keywords"
        ],
        "additionalProperties": false
      },
      "strict": true
    },
    "verbosity": "medium"
  },
  "tool_choice": "auto",
  "tools": [],
  "top_logprobs": 0,
  "top_p": 1.0,
  "truncation": "disabled",
  "usage": {
    "input_tokens": 98,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 536,
    "output_tokens_details": {
      "reasoning_tokens": 512
    },
    "total_tokens": 634
  },
  "user": null,
  "metadata": {}
}

Verifying @sps gave us a working line (just with a few extra spaces). So the answer to the question is that, “Yes, user sps knows”.

2 Likes