"Responses" API endpoint - reference documentation errors and issues

As I continue cranking through becoming an overnight expert on new stuff, I see that what documentation is offering has some mis-statements. I’ll try to make this a report of issues and not a travel blog.


Issue 1: Responses: output_text as content

An input message is described as being for any role type, such as previously used with chat completions:

A message input to the model with a role indicating instruction following hierarchy. Instructions given with the developer or system role take precedence over instructions given with the user role. Messages with the assistant role are presumed to have been generated by the model in previous interactions.

For providing a text part of “content”, the documentation says this must be always "type": "input_text":

Undocumented

However, what the reference does not provide is the allowed text part of content that can also be “type”: “output_text”, or when this is validated or rejected based on a role.

This is discovered by what the API Playground is sending back in its call:

{
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "{a JSON the AI previously produced}"
        }
      ]
    }

The playground doesn’t use the previous response ID mechanism. Nor does it demonstrate the instructions field. It is similar to a chat completions list of messages still.

“output_text” is described as what you get as response. It seems you can send this back. Can you send assistant output back exactly, for example sending back a refusal also to responses endpoint? You cannot on chat completions. I’ll have to try sending every field the assistant can provide to see what must be altered - but clearly, text is allowed, and it can be a different type.


A better API reference for content

The API reference is tedious levels of clicking and expanding over and over.

Below is a Markdown reference that consolidates the various content object types (including the undocumented output_text in a preliminary form) into a single specification, with multi-level nesting and one-line parameter/type/descriptions.


Input

input (array)

 A list of one or many input message objects to the model.

 

Input Message (object)

  • role (string, required): The role of the message. Typically one of system, user, assistant, or developer.
  • content (array, required): A list of content items for this message. Each item must define a type and its corresponding fields.

 

Content Item (object)

A single content element within the message’s content array.

  • type (string, required): Specifies which content subtype is being used. One of:
    • input_text
    • input_image
    • input_file
    • output_text (undocumented in official docs but observed in practice)

Depending on the value of type, the remaining fields differ as follows:

  1. Text Input (type: input_text)

    • text (string, required): The text input to be sent to the model.
  2. Image Input (type: input_image)

    • detail (string, required): One of high, low, or auto. Defaults to auto.
    • file_id (string or null, optional): The ID of the file to be sent to the model.
    • image_url (string or null, optional): The fully qualified URL or a base64-encoded data URL of the image.
  3. File Input (type: input_file)

    • file_data (string, optional): The file contents to be sent to the model.
    • file_id (string, optional): The ID of the file to be sent to the model.
    • filename (string, optional): The name of the file being sent to the model.
  4. Output Text (type: output_text)

    • text (string, required): The text output previously generated by the AI (e.g. “assistant” response) you wish to pass back in for context.
    • annotations (array, optional): An array of annotation objects providing metadata about the text output. Needing verification as input, along with verifying other output fields the assistant response can contain.

Hopefully useful as a quickstart API “content” reference for you, rewritten. Like the API reference web page, it doesn’t clearly break down what is allowed per role, implying that developer or assistant could have images or files.


Example structured output contextual call

{
  "model": "gpt-4o-mini",
  "input": [
    {
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Here is a prior AI response you want to include as context."
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "Now continue our conversation."
        }
      ]
    }
  ],
  "tools": [],
  "text": {
    "format": {
      "type": "json_schema",
      "name": "underscores_and_dashes-only",
      "description": "a description of response purpose and destination",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "response": {
            "type": "string"
          }
        },
        "required": ["response"]
      }
    }
  }
}
  • This content structure allows you to send messages, with text, images, and now files, combined or interleaved as you like into parts of a single message content. Assistant as “input_text” or “output_text”? Who knows…

  • We see that empty tools list is being allowed without error now.

  • A JSON schema demonstrated has a new optional field for “description”, which I harped about before as a missing aspect to the response format.

  • Where do files go? Read on…

2 Likes