Partially structured output? Free text output, but force correct tool call JSON

I was reading about the new structured output and how it can be used to force the model to output only correct JSON data. From what I have seen, this is only possible for the entire message and I have found no counterexamples…

I have a chatbot that sometimes generates non-valid JSON when calling a tool and wanted to use structured output to correct this, as prompting didn’t get me there 100%.
Is it possible to directly let the chatbot run as usual, generating normal text, but enable structured output for the tool call?

If push comes to shove, I could try to split the calls and ask another model to generate the JSON for the tool call strictly, but that would be quite a hassle.

Edit: I am currently letting the LLM generate JSON without restrictions, but when it generates invalid or incorrect JSON, I give it an error message and a reminder on how to call the tool. This seems to work alright. Prompting with examples also helps quite a bit.

2 Likes

Yes, the new Structured Outputs feature can help improve the reliability of the AI’s function calls. When you enable Structured Outputs for a function, the AI model will generate arguments that exactly match the JSON Schema provided in the function definition. This reduces the chance of errors and ensures that the AI doesn’t omit required keys or hallucinate invalid enum values.

To enable Structured Outputs for a function, you need to set "strict": true in your function definition. Here’s an example of how to define a function with Structured Outputs:

toolspec.extend([{
    "type": "function",
    "function": {
        "name": "get_random_float",
        "description": "True random number floating point generator. Returns a float within the specified range.",
        "parameters": {
            "type": "object",
            "properties": {
                "range_start": {
                    "type": "number",
                    "description": "Minimum float value",
                },
                "range_end": {
                    "type": "number",
                    "description": "Maximum float value",
                },
            },
            "required": ["range_start", "range_end"],
            "additionalProperties": false
        },
        "strict": true
    }
}])

In this example, the "strict": true setting ensures that the AI will always generate arguments that match the provided JSON Schema. This means that the AI will always include the range_start and range_end keys in the function call, and it won’t include any additional properties.

Please note that when "strict": true is set, all properties in the schema must be set to required. If you have optional properties, you should set "strict": false and omit those properties from the required list.

Also, remember to use the latest OpenAI Python SDK and the latest AI models that support Structured Outputs, such as gpt-4o-2024-08-06.

(the above entirely produced by an AI that knows)

1 Like

What I don’t understand is why everyone keeps mentioning functions once structured responses are in question. I realize that they are somewhat the same under the hood.

Function definition → Tool called → Tool responded → AI processed results → Returned Structured Response

Why is everyone conflating the two, am I missing something?

Because in this case that’s that the topic is about

Fair enough – in this case.

So the plan for him is to send the schema as response_format.json_schema and call the tool manually like that?

That seems awful hotfix since he has to explicitly anticipate the tool call instead of letting LLM make that decision.

I don’t see how he would take on the issue of pairing a tool call with tool results. Usually, those messages are distinct in their properties.

No.

You just set strict is true in the function definition as per @_j

Note this is a different problem to:

While that does make sure that the function call is correct, it doesn’t solve my problem, as it forces me to use fully structured output, which I do not want.

Setting strict to True in the function definition and setting the Response Format to text gives me a 400 Bad Request Error from OpenAI (The library I use uses the http endpoint of OpenAI).

For now, I’m rejecting the invalid JSON and instructing the LLM on how to use it if the format is incorrect.

Why do you need to set response format?

Setting strict is true on a specific function should not affect the general output from the LLM?

Isn’t it one or the other as per the docs?

Of course does setting strict to true on a specific function not influence the general output, but if the request is not correct, OpenAI will not answer anything. I didn’t set the response format in my code; it defaults to text, which I want.

What I meant is:
OpenAI’s http endpoint (don’t know about python or other) does not allow me to have text output from the LLM but have structured output for a tool call. If I try, I get 400 Bad Request.

1 Like

A structured schema specification can be used on either the function writing or on the final response to the user (response output JSON is probably used by code instead of directly shown). – or both.

Therefore, it is completely documented and permissible to specify strict:true on a function specification, with the other requirements: of no properties being optional, all properties being specified in “required” at every object nest level, and additionalProperties: false at every object level.

A function is not for getting an output from the AI - it is for using your code’s tool utilities when needed.



I crafted a little (big, actually) chatbot-as-utility for sending and parsing, using streaming, using whatever features and formats I might “turn on”, and spent a good part of the day fuzzing the API. Here’s what I find:

Structured Output

Completely working with strict function AND strict response_format

This is what actually causes issues, that others have not overcome.

This requires valid schema and valid tool specification of course. Also a requirement is to use the message content object format with blocks of “type” (as if sending images also). On everything sent. Not untyped strings.

A complete “chat session” is shown here (read along) that supplies a tool call back and a tool return to the API, first a single tool invocation seen in the history, and then parallel tool call.

{
   "model": "gpt-4o-2024-08-06",
   "messages": [
      {
         "role": "system",
         "content": [
            {
               "type": "text",
               "text": "You are a helpful assistant."
            }
         ]
      },
      {
         "role": "user",
         "content": [
            {
               "type": "text",
               "text": "Check the weather in london"
            }
         ]
      },
      {
         "role": "assistant",
         "tool_calls": [
            {
               "index": 0,
               "id": "call_bJbU2WrFTr8r2dEXROGsw5Pi",
               "type": "function",
               "function": {
                  "name": "get_current_weather",
                  "arguments": "{\"location\":\"London\",\"unit\":\"celsius\"}"
               }
            }
         ]
      },
      {
         "role": "tool",
         "name": "get_current_weather",
         "content": [
            {
               "type": "text",
               "text": "London, UK: 19C, sunny"
            }
         ],
         "tool_call_id": "call_bJbU2WrFTr8r2dEXROGsw5Pi"
      },
      {
         "role": "assistant",
         "content": [
            {
               "type": "text",
               "text": "{\"response_to_user\":\"The current weather in London is 19\u00b0C and sunny.\"}"
            }
         ]
      },
      {
         "role": "user",
         "content": [
            {
               "type": "text",
               "text": "How about Portland and Seattle?"
            }
         ]
      },
      {
         "role": "assistant",
         "tool_calls": [
            {
               "index": 0,
               "id": "call_bJbU2WrFTr8r2dEXROGsw5Pi",
               "type": "function",
               "function": {
                  "name": "get_current_weather",
                  "arguments": "{\"location\": \"Portland\", \"unit\": \"fahrenheit\"}"
               }
            },
            {
               "index": 1,
               "id": "call_Hhhlczyjfqlb4Pc3o5Qyow2N",
               "type": "function",
               "function": {
                  "name": "get_current_weather",
                  "arguments": "{\"location\": \"Seattle\", \"unit\": \"fahrenheit\"}"
               }
            }
         ]
      },
      {
         "role": "tool",
         "name": "get_current_weather",
         "content": [
            {
               "type": "text",
               "text": "Portland, OR: 72F, Sunny"
            }
         ],
         "tool_call_id": "call_bJbU2WrFTr8r2dEXROGsw5Pi"
      },
      {
         "role": "tool",
         "name": "get_current_weather",
         "content": [
            {
               "type": "text",
               "text": "Seattle, WA: 66F, Overcast"
            }
         ],
         "tool_call_id": "call_Hhhlczyjfqlb4Pc3o5Qyow2N"
      }
   ],
   "stream": true,
   "stream_options": {
      "include_usage": true
   },
   "tools": [
      {
         "type": "function",
         "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location.",
            "parameters": {
               "type": "object",
               "properties": {
                  "location": {
                     "type": "string",
                     "description": "The city to get the weather of"
                  },
                  "unit": {
                     "type": "string",
                     "enum": [
                        "celsius",
                        "fahrenheit"
                     ],
                     "description": "The unit of temperature, F for USA"
                  }
               },
               "required": [
                  "location",
                  "unit"
               ],
               "additionalProperties": false
            },
            "strict": true
         }
      }
   ],
   "tool_choice": "auto",
   "response_format": {
      "type": "json_schema",
      "json_schema": {
         "name": "structured_response",
         "schema": {
            "type": "object",
            "properties": {
               "response_to_user": {
                  "type": "string",
                  "description": "The assistant's response to the user"
               }
            },
            "required": [
               "response_to_user"
            ],
            "additionalProperties": false
         },
         "strict": true
      }
   }
}

(it was sent without the white space)

That gives our response, AI talking about the weather from two cities of two parallel tool calls - again the output being placed in my basic response schema as it was earlier in the chat history:

{
   "response_to_user": "In Portland, OR, the current weather is 72\u00b0F and sunny.\nIn Seattle, WA, the current weather is 66\u00b0F and overcast."
}

Unpredictable performance

The AI can write to the user before calling a function, in the same output. If using a response schema, AI can write this response to a user in JSON, and also call the tool with structured response being strict.

What fails is sending the assistant message back with both “content” and “tool_call” - even though that’s what the AI emitted. OpenAI failed to account for this and does not provide a good error message why (such as message rejection from validation error).

   "messages": [
      {
         "role": "system",
         "content": [
            {
               "type": "text",
               "text": "You are a helpful assistant."
            }
         ]
      },
      {
         "role": "user",
         "content": [
            {
               "type": "text",
               "text": "Tell me what tools you have for getting weather. Then get the weather for Miami."
            }
         ]
      },
      {
         "role": "assistant",
         "content": [
            {
               "type": "text",
               "text": "{\"response_to_user\":\"I can get the current weather for a given location. Let me check the weather for Miami for you.\",\"response_topic\":\"Get current weather\"}"
            }
         ],
         "tool_calls": [
            {
               "index": 0,
               "id": "call_PhUTQUo00ds0WwuisCICzWwz",
               "type": "function",
               "function": {
                  "name": "get_current_weather",
                  "arguments": "{\"location\":\"Miami\",\"unit\":\"fahrenheit\"}"
               }
            }
         ]
      },
      {
         "role": "tool",
         "name": "get_current_weather",
         "content": [
            {
               "type": "text",
               "text": "Miami, FL: 88F, Sunny and clear"
            }
         ],
         "tool_call_id": "call_PhUTQUo00ds0WwuisCICzWwz"
      }
   ]

The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID…

The solution there seems to be a workaround where you send an assistant message with the content just for the AI’s understanding of its pre-discussion, and then send a second assistant message with the tool_call that was emitted, immediately followed by the tool return.


Successful chat session with combined user response and tool call, all structured

By splitting assistant feedback into two chat history messages

Note: I manually provide what a tool function would return.

User: Tell me what tools you have for getting weather. Then get the weather for Miami.

{“response_to_user”:“I have a tool called get_current_weather which allows me to fetch the current weather for a given location. Here’s the weather for Miami:”,“response_topic”:“Weather in Miami”}

Usage: prompt_tokens=155, completion_tokens=67, total_tokens=222
prompt_tokens_details: cached_tokens: 0
completion_tokens_details: reasoning_tokens: 0

*** Tool call detected ***
get_current_weather with arguments:
{“location”:“Miami”,“unit”:“fahrenheit”}
Please provide the result for get_current_weather: Miami's weather today: 77F and sunny!

{“response_to_user”:“The current weather in Miami is 77°F, and it’s sunny!”,“response_topic”:“Weather in Miami”}

Usage: prompt_tokens=245, completion_tokens=32, total_tokens=277
prompt_tokens_details: cached_tokens: 0
completion_tokens_details: reasoning_tokens: 0

Latest gpt-4o or mini required…

1 Like