Structured Outputs with Batch Processing

slippy · August 17, 2024, 9:59am

Hi,

Hopefully this is me doing something wrong which can be easily fixed and not a bug…

I’ve successfully run the structured outputs using the client.beta.chat.completions.parse() method but when I’ve tried to do the same in batch processing I am getting errors or missing keys.

{
  "id": "batch_req_xxx",
  "custom_id": "request-0",
  "response": {
    "status_code": 400,
    "request_id": "xxx",
    "body": {
      "error": {
        "message": "Invalid value: 'object'. Supported values are: 'json_object', 'json_schema', and 'text'.",
        "type": "invalid_request_error",
        "param": "response_format.type",
        "code": "invalid_value"
      }
    }
  },
  "error": null
}

this is the response_format I’m using on both API call and batch file:

{
  "properties": {
    "optimised_title": {
      "title": "Optimised Title",
      "type": "string"
    },
    "meta_description": {
      "title": "Meta Description",
      "type": "string"
    }
  },
  "required": [
    "optimised_title",
    "meta_description"
  ],
  "title": "DynamicSchema",
  "type": "json_schema"
}

The other errors I’m getting is we expected an object but got a string followed by we expected a string and got an object` after I change it.

I’m be wresting with this for the past 48 hours can anyone help me?

platypus · August 17, 2024, 7:54pm

@slippy Hi! I think you have incorrect syntax in your JSON schema, i.e. you are using “title” as opposed to “description”? So it should actually look like this:

{
  "properties": {
    "optimised_title": {
      "description": "Optimised Title",
      "type": "string"
    },
    "meta_description": {
      "description": "Meta Description",
      "type": "string"
    }
  },
  "required": [
    "optimised_title",
    "meta_description"
  ],
  "description": "DynamicSchema",
  "type": "json_schema"
}

slippy · August 18, 2024, 9:45am

Hey @platypus - thanks for the reply.

I’ve tried that but still not working, I’ve actually copied the example given here and it’s still not working:

{
  "custom_id": "request-1",
  "method": "POST",
  "url": "/v1/chat/completions",
  "body": {
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful math tutor. Guide the user through the solution step by step."
      },
      {
        "role": "user",
        "content": "how can I solve 8x + 7 = -23"
      }
    ],
    "max_tokens": 4096,
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "math_response",
        "schema": {
          "type": "object",
          "properties": {
            "steps": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "explanation": {
                    "type": "string"
                  },
                  "output": {
                    "type": "string"
                  }
                },
                "required": [
                  "explanation",
                  "output"
                ],
                "additionalProperties": false
              }
            },
            "final_answer": {
              "type": "string"
            }
          },
          "required": [
            "steps",
            "final_answer"
          ],
          "additionalProperties": false
        },
        "strict": true
      }
    }
  }
}

this particular attempt, I don’t get an error file, just a ‘failed’ and it reads the file as empty - going into the file storage (in the OpenAI dashboard) and downloading the uploaded file, I see the above - so this code is confirmed to be uploading.

Appreciate any further guidance

platypus · August 18, 2024, 3:43pm

@slippy I tried your example above using Batch API and it worked fine. It might be super silly question, but in your .jsonl you don’t have any indentation right?

Anyway, here are the exact steps I took

Step 1: I created a batchinput.jsonl with the following contents (NOTE: single line, no indentations, whitespaces are OK by JSON standard)

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."}, {"role": "user", "content": "how can I solve 8x + 7 = -23"}], "max_tokens": 4096, "response_format": {"type": "json_schema", "json_schema": {"name": "math_response", "schema": {"type": "object", "properties": {"steps": {"type": "array", "items": {"type": "object", "properties": {"explanation": {"type": "string"}, "output": {"type": "string"}}, "required": ["explanation", "output"], "additionalProperties": false}}, "final_answer": {"type": "string"}}, "required": ["steps", "final_answer"], "additionalProperties": false}, "strict": true}}}}

Step 2: I uploaded batchinput.jsonl via Files API. I received a file ID in the response; in this case it is file-F2ieFWin68wvubNmBPOUvsDW.

curl https://api.openai.com/v1/files \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F purpose="batch" \
  -F file="@batchinput.jsonl"

Step 3: I created a batch using the above file ID. I received a batch ID in the response; in this case it was batch_asVAZzeehZ4mf2QE9zi1krvE

curl https://api.openai.com/v1/batches \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input_file_id": "file-F2ieFWin68wvubNmBPOUvsDW",
    "endpoint": "/v1/chat/completions",
    "completion_window": "24h"
  }'

Step 4: I queried the batch status using the above batch ID. When I saw that the status was set to completed, I noted the output_file_id, in this case it is file-8E8AlSiy5puk3vBkthQ8UNlw.

curl https://api.openai.com/v1/batches/batch_asVAZzeehZ4mf2QE9zi1krvE \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json"

Step 5: Retrieve the final output using the above output file ID.

curl https://api.openai.com/v1/files/file-8E8AlSiy5puk3vBkthQ8UNlw/content \
  -H "Authorization: Bearer $OPENAI_API_KEY" > batch_output.jsonl

The final output (as per batch_output.jsonl) is as follows:

{"id": "batch_req_3aNHqQt8UB1idP6RkXetGJgM", "custom_id": "request-1", "response": {"status_code": 200, "request_id": "a7ba2b98e25ff47189d7550eee2d8072", "body": {"id": "chatcmpl-9xc57s5a7ngga92riZU0wpcIdhEG3", "object": "chat.completion", "created": 1723994765, "model": "gpt-4o-mini-2024-07-18", "choices": [{"index": 0, "message": {"role": "assistant", "content": "{\"steps\":[{\"explanation\":\"First, we will isolate the term with 'x' by moving the constant term to the other side of the equation. We can do this by subtracting 7 from both sides.\",\"output\":\"8x + 7 - 7 = -23 - 7\"},{\"explanation\":\"This simplifies to 8x = -30.\",\"output\":\"8x = -30\"},{\"explanation\":\"Next, we will isolate 'x' by dividing both sides of the equation by 8.\",\"output\":\"x = -30 / 8\"},{\"explanation\":\"Now we can simplify -30 / 8. We can divide both the numerator and the denominator by 2.\",\"output\":\"x = -15 / 4\"},{\"explanation\":\"Finally, we can write -15 / 4 in decimal form if necessary. -15 / 4 = -3.75.\",\"output\":\"x = -3.75\"}],\"final_answer\":\"x = -15/4 or x = -3.75\"}", "refusal": null}, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 45, "completion_tokens": 207, "total_tokens": 252}, "system_fingerprint": "fp_507c9469a1"}}, "error": null}

slippy · August 18, 2024, 4:30pm

Hmm. The only think I’m doing is added a \n character to put each on a new line but when I’m not adding the response_format section it works fine, unless you mean something else?

platypus · August 18, 2024, 6:21pm

@slippy if you follow my steps above exactly, and copy-paste exactly how I printed (including the jsonl file), without any additional formatting (but be careful to replace with your own file ID, batch ID and output ID) - does it work?

Because I didn’t do anything special - I just copy-pasted your schema, and followed the steps as per Batch API documentation, and it worked.

slippy · August 19, 2024, 10:56am

I tried yours and it worked. I think ran mine again for comparison and it also worked. So I’m not sure what’s happened to be honest.

Regardless, I credit you for the fix so I just wanted to say I really appreciate you having a look and taking the time.

platypus · August 19, 2024, 11:26am

Glad it works, and always happy to help @slippy !

techdevoscar · August 31, 2024, 8:47pm

Solved my issue as well! Thank you Playtypus

karthik.shivaram · November 10, 2024, 5:26am

If anyone else facing issue with using structured outputs for batch api, i have found this works the easiest.

Define your schema using Pydantic , then you can convert it to OAI compliant strict json schema using to_strict_json_schema function available in the python client. (though a private method and ugly, this was much easier and more reliable than other 3rd party libraries that have not been tested extensively).

from openai.lib._pydantic import to_strict_json_schema

platypus · November 10, 2024, 4:38pm

Nice, thanks for the tip!

Topic		Replies	Views
Is client.beta.chat.completions.parse Supported in Batch API? API	4	457	February 11, 2025
Using Pydantic structured outputs in batch mode API	6	3124	November 29, 2024
API response is not JSON parsable despite specified response format API api , response_format , gpt-4o-mini , structured-output	13	1274	November 21, 2024
Error after creating batch job "This line is not parseable as valid JSON" API batch	4	1970	May 2, 2024
Pydantic response model failure Bugs gpt-4	16	2621	June 18, 2024

Structured Outputs with Batch Processing

Related topics