Structured Outputs with Batch Processing

Hi,

Hopefully this is me doing something wrong which can be easily fixed and not a bug…

I’ve successfully run the structured outputs using the client.beta.chat.completions.parse() method but when I’ve tried to do the same in batch processing I am getting errors or missing keys.

{
  "id": "batch_req_xxx",
  "custom_id": "request-0",
  "response": {
    "status_code": 400,
    "request_id": "xxx",
    "body": {
      "error": {
        "message": "Invalid value: 'object'. Supported values are: 'json_object', 'json_schema', and 'text'.",
        "type": "invalid_request_error",
        "param": "response_format.type",
        "code": "invalid_value"
      }
    }
  },
  "error": null
}

this is the response_format I’m using on both API call and batch file:

{
  "properties": {
    "optimised_title": {
      "title": "Optimised Title",
      "type": "string"
    },
    "meta_description": {
      "title": "Meta Description",
      "type": "string"
    }
  },
  "required": [
    "optimised_title",
    "meta_description"
  ],
  "title": "DynamicSchema",
  "type": "json_schema"
}

The other errors I’m getting is we expected an object but got a string followed by we expected a string and got an object` after I change it.

I’m be wresting with this for the past 48 hours can anyone help me?

@slippy Hi! I think you have incorrect syntax in your JSON schema, i.e. you are using “title” as opposed to “description”? So it should actually look like this:

{
  "properties": {
    "optimised_title": {
      "description": "Optimised Title",
      "type": "string"
    },
    "meta_description": {
      "description": "Meta Description",
      "type": "string"
    }
  },
  "required": [
    "optimised_title",
    "meta_description"
  ],
  "description": "DynamicSchema",
  "type": "json_schema"
}
2 Likes

Hey @platypus - thanks for the reply.

I’ve tried that but still not working, I’ve actually copied the example given here and it’s still not working:

{
  "custom_id": "request-1",
  "method": "POST",
  "url": "/v1/chat/completions",
  "body": {
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful math tutor. Guide the user through the solution step by step."
      },
      {
        "role": "user",
        "content": "how can I solve 8x + 7 = -23"
      }
    ],
    "max_tokens": 4096,
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "math_response",
        "schema": {
          "type": "object",
          "properties": {
            "steps": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "explanation": {
                    "type": "string"
                  },
                  "output": {
                    "type": "string"
                  }
                },
                "required": [
                  "explanation",
                  "output"
                ],
                "additionalProperties": false
              }
            },
            "final_answer": {
              "type": "string"
            }
          },
          "required": [
            "steps",
            "final_answer"
          ],
          "additionalProperties": false
        },
        "strict": true
      }
    }
  }
}

this particular attempt, I don’t get an error file, just a ‘failed’ and it reads the file as empty - going into the file storage (in the OpenAI dashboard) and downloading the uploaded file, I see the above - so this code is confirmed to be uploading.

Appreciate any further guidance

1 Like

@slippy I tried your example above using Batch API and it worked fine. It might be super silly question, but in your .jsonl you don’t have any indentation right?

Anyway, here are the exact steps I took :blush:

Step 1: I created a batchinput.jsonl with the following contents (NOTE: single line, no indentations, whitespaces are OK by JSON standard)

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."}, {"role": "user", "content": "how can I solve 8x + 7 = -23"}], "max_tokens": 4096, "response_format": {"type": "json_schema", "json_schema": {"name": "math_response", "schema": {"type": "object", "properties": {"steps": {"type": "array", "items": {"type": "object", "properties": {"explanation": {"type": "string"}, "output": {"type": "string"}}, "required": ["explanation", "output"], "additionalProperties": false}}, "final_answer": {"type": "string"}}, "required": ["steps", "final_answer"], "additionalProperties": false}, "strict": true}}}}

Step 2: I uploaded batchinput.jsonl via Files API. I received a file ID in the response; in this case it is file-F2ieFWin68wvubNmBPOUvsDW.

curl https://api.openai.com/v1/files \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F purpose="batch" \
  -F file="@batchinput.jsonl"

Step 3: I created a batch using the above file ID. I received a batch ID in the response; in this case it was batch_asVAZzeehZ4mf2QE9zi1krvE

curl https://api.openai.com/v1/batches \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input_file_id": "file-F2ieFWin68wvubNmBPOUvsDW",
    "endpoint": "/v1/chat/completions",
    "completion_window": "24h"
  }'

Step 4: I queried the batch status using the above batch ID. When I saw that the status was set to completed, I noted the output_file_id, in this case it is file-8E8AlSiy5puk3vBkthQ8UNlw.

curl https://api.openai.com/v1/batches/batch_asVAZzeehZ4mf2QE9zi1krvE \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json"

Step 5: Retrieve the final output using the above output file ID.

curl https://api.openai.com/v1/files/file-8E8AlSiy5puk3vBkthQ8UNlw/content \
  -H "Authorization: Bearer $OPENAI_API_KEY" > batch_output.jsonl

The final output (as per batch_output.jsonl) is as follows:

{"id": "batch_req_3aNHqQt8UB1idP6RkXetGJgM", "custom_id": "request-1", "response": {"status_code": 200, "request_id": "a7ba2b98e25ff47189d7550eee2d8072", "body": {"id": "chatcmpl-9xc57s5a7ngga92riZU0wpcIdhEG3", "object": "chat.completion", "created": 1723994765, "model": "gpt-4o-mini-2024-07-18", "choices": [{"index": 0, "message": {"role": "assistant", "content": "{\"steps\":[{\"explanation\":\"First, we will isolate the term with 'x' by moving the constant term to the other side of the equation. We can do this by subtracting 7 from both sides.\",\"output\":\"8x + 7 - 7 = -23 - 7\"},{\"explanation\":\"This simplifies to 8x = -30.\",\"output\":\"8x = -30\"},{\"explanation\":\"Next, we will isolate 'x' by dividing both sides of the equation by 8.\",\"output\":\"x = -30 / 8\"},{\"explanation\":\"Now we can simplify -30 / 8. We can divide both the numerator and the denominator by 2.\",\"output\":\"x = -15 / 4\"},{\"explanation\":\"Finally, we can write -15 / 4 in decimal form if necessary. -15 / 4 = -3.75.\",\"output\":\"x = -3.75\"}],\"final_answer\":\"x = -15/4 or x = -3.75\"}", "refusal": null}, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 45, "completion_tokens": 207, "total_tokens": 252}, "system_fingerprint": "fp_507c9469a1"}}, "error": null}
6 Likes

Hmm. The only think I’m doing is added a \n character to put each on a new line but when I’m not adding the response_format section it works fine, unless you mean something else?

1 Like

@slippy if you follow my steps above exactly, and copy-paste exactly how I printed (including the jsonl file), without any additional formatting (but be careful to replace with your own file ID, batch ID and output ID) - does it work?

Because I didn’t do anything special - I just copy-pasted your schema, and followed the steps as per Batch API documentation, and it worked.

I tried yours and it worked. I think ran mine again for comparison and it also worked. So I’m not sure what’s happened to be honest.

Regardless, I credit you for the fix so I just wanted to say I really appreciate you having a look and taking the time.

2 Likes

Glad it works, and always happy to help @slippy !

Solved my issue as well! Thank you Playtypus

1 Like

If anyone else facing issue with using structured outputs for batch api, i have found this works the easiest.

Define your schema using Pydantic , then you can convert it to OAI compliant strict json schema using to_strict_json_schema function available in the python client. (though a private method and ugly, this was much easier and more reliable than other 3rd party libraries that have not been tested extensively).

from openai.lib._pydantic import to_strict_json_schema
2 Likes

Nice, thanks for the tip!

1 Like