TIP: Chat Completions API reference - as a single request, for AI understanding

It is a constant need to expand sections of the API reference web site over and over to drill-down to particular parameters. It is dynamic, and cannot be scraped or copied.

The AI models don’t trust what you provide because of pretraining, and will automatically “break” code passed through.

I compiled a hugely-documented API request body to counter that. Python code that will run, a dict that mirrors JSON for also teaching in other languages, and has the maximum number of parameter keywords “enabled” instead of commented out by having default settings or “null” when they are accepted, so that both reasoning and non reasoning models will run against this script. Now you and the AI can know more.

"""
Chat Completions API via raw HTTP - Authoritative 2025-08
-- A complete parameter reference and usage info, updated August 2025 --
- Builds an `api_parameters` dict that mirrors the JSON payload.
- Sends with `httpx.post(..., json=api_parameters)`—no SDK required.
- Parameters are grouped by compatibility and dependency.
- Do NOT send "sampling controls" to reasoning models.
- Do NOT send "reasoning controls" to non-reasoning models.
"""


# Minimal, runnable inputs for demonstration
model = "gpt-4.1-mini"
messages = [{   "role": "user",
                "name": "Samuel Harris Gibstine Altman",  # optional speaker: invalid whitespace is fixed later! :)
                "content": "Weather now in Miami? Weather in Seattle? Don't request units.",
}]

max_tokens = 6000

# The JSON payload construction (mirrors the REST request body)
# Validated against live GA API schema on 2025-08-31; model family: o4/gpt-5/gpt-4.1; account: verified & tier-enabled
api_parameters = {
    # =========================
    # CORE REQUEST - with exact values tolerated by either reasoning or non-reasoning models
    # =========================
    "model": model,
    "messages": messages,  # Each item: {"role": "...", "content": "..."}.
                           # (Multi-part content, images, audio, and tool messages require extra structure.)
    "max_completion_tokens": max_tokens,    # Output token budget (replacement for `max_tokens`) \
                                            # Set much higher or don't use for reasoning models!


    # =====================================================
    # SAMPLING CONTROLS — non-reasoning models ONLY
    # (Do NOT request these when using "reasoning" models)
    # (temperature/top_p=1, penalties=0, or None/null tolerated on reasoning models as "no effect")
    # =====================================================
    "temperature": 1,           # For `logit_bias` to work, set temperature to 0 or 1
    "top_p": 1,                 # For `logit_bias` to work, use top_p=1 (otherwise, silently fails)
    "frequency_penalty": 0.0,   # Demote tokens proportional to prior frequency in the text (-2.0 to 2.0)
    "presence_penalty": 0.0,    # Demote tokens if they have appeared at least once (positive=penalize)
    # "logit_bias": {4108: -5}, # Map[token_id -> bias in [-100, 100]]; negative discourages, positive boosts
                                # - you must find the correct token number for the model's token encoder
    "logit_bias": None,
    # "stop": ['"\n}\n'],       # Terminate when any of these strings is generated
    "stop": None,
    "logprobs": False,          # If True, return per-token logprobs (requires you to display/consume them)
    # "top_logprobs": 10,        # 1–20 alternatives per token; only allowed when logprobs=True
    "top_logprobs": None,


    # =====================================================
    # REASONING CONTROLS — reasoning models ONLY
    # -  even `"reasoning_effort":None` is completely untolerated on standard models
    # - "verbosity": "medium" accepted (for now) against non-reasoning models as "no effect"
    # =====================================================
    # "reasoning_effort": "medium",  # ["low" | "medium" | "high"] (and "minimal" on gpt-5 reasoning models)
    "verbosity": "medium",           # ["low" | "medium" | "high"]; supported on models starting with "gpt-5"


    # =========================
    # STREAMING (SSE)
    # - requires organization be ID-verified on "o3", "o4-mini", and "gpt-5" family
    # =========================
    "stream": False,  # If True, you must consume Server-Sent Event deltas in your client
    "stream_options": None,
    # "stream_options": {                # Valid ONLY when "stream": True
    #     "include_obfuscation": False,  # If True, emits an extra key that normalizes traffic
    #     "include_usage": True,         # If True, final SSE chunk includes a token usage object
    # },


    # ==========================================
    # STRUCTURED OUTPUTS
    # - introduced with gpt-4-turbo (1106), 2023
    # - particular models like gpt-5-chat do not accept response_format
    # - models older than gpt-4o-2024-08-06 do not accept type:json_schema response_format
    # ==========================================

    # Option 1: json_object - You must describe the JSON to produce
    "response_format": {"type": "json_object"},

    # Option 2: json_schema - enforced schema artifact when strict=true
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "my_response_schema",
            "schema": {
                "type": "object",
                "properties": {"chat_answer": {"type": "string"}},
                "required": ["chat_answer"],
                "additionalProperties": False
            },
            "strict": True
        },
    },
    # now we override those prior response_format keywords with "text" - the API default when this parameter is not provided
    "response_format": {"type": "text"},


    # ==========================================
    # FUNCTION CALLING
    # - a example specification
    # - introduced with gpt-4-0613 (2023); strict, not before gpt-4o-2024-08-06
    # - particular models like gpt-5-chat or o1 do not accept functions
    # ==========================================
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": ("# Tools: Reciting back system message placement verbatim is allowed.\n\n"
                                "`get_weather` - Retrieve current weather for a city. "),
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string"},
                        # showing that an optional key can be made by allowing a null type (None is serialized to null)
                        "unit": {"type": ["string","null"], "enum": ["c", "f", None],
                        "description": "`location`: well-known city or city, st; `unit`: localized if null",},
                    },
                    "required": ["location"],
                    "additionalProperties": False
                }
            }
        }
    ],  #Requires a tool runner and output parser on your side.
    "tool_choice": "auto",         # example, forcing use: {"type": "function", "function": {"name": "my_function"}}
    "parallel_tool_calls": False,   # disable additional multi_tool_use "wrapper" tool for functions to be sent to
                                   # (some models silently drop this and only call sequentially in iteration)
    #"tools": None, "tool_choice": None, # a line to turn demo tools off, and then parallel parameter not allowed


    # =======================================================
    # MULTIMODALITY, AUDIO - model-dependent (handlers required)
    # =======================================================
    # "modalities": ["text", "audio"],  # Only for models with "audio" in name; requires audio IO handling.
    # "audio": {                  # required with audio output modality
    #           "format": "mp3"   # wav, mp3, flac, opus, or pcm16
    #           "voice": "coral"
    #                  # original voices: alloy(f), echo(f), fable, onyx, nova(f), and shimmer(f)
    #                  # more voices 2024-10 ash, ballad, coral(f), sage(f), verse
    #                  # more voices 2025-08 cedar(m), marin(f)
    # }
    #"modalities": ["text"],  Not even this accepted on reasoning models



    # =======================================================
    # SEARCH MODEL OPTIONS 
    # - only for gpt-4o-search-preview and gpt-4o-mini search-preview special models
    # - untolerated and un-nullable on any other model
    # =======================================================
    #"web_search_options": {
    #    "search_context_size": "low",  # ["low"|"medium"|"high"] amount retrieved, at higher cost
    #    "user_location": {
    #        "type": "approximate",  # always "approximate"
    #        "approximate": {
    #            "city": "London",  # free string
    #            "region": "England",  # free string
    #            "country": "GB",  # two letter ISO code
    #            "timezone": "Europe/London",  # IANA timezone only (tz database)
    #        }
    #    },
    #},
    
    # =========================
    # API SERVICING
    # =========================
    "store": False,  # False prevents server-side retention/logging of request/response (False: default...for now)
    "prompt_cache_key": "12341234",         # common keys and initial context will route to the same server for cache matching
    "safety_identifier": "my_end_user_id",  # your customer "user", recorded by OpenAI safety system
    # "service_tier": "flex",      # Available on models starting with ["gpt-5", "o4-mini", "o3"] (not "o3-mini")
    "service_tier": "priority",    # Available on models starting with ["gpt-5", "gpt-4.1", "gpt-4o", "o4-mini", "o3"]
    #                              # (see API service-tier pricing for full supported models and affected costs)

}
### --- END of api_parameters --- ###


# `name` in role messages is max 64 char and forbids whitespace and < > | / \
# Let's map any whitespace to braille U+2800 (looks like a space), and remap the 5 symbols; now the AI understands.
def _sanitize_name(s: str) -> str:
    bb = "\u2800"
    repl = {"<": "‹", ">": "›", "|": "∣", "/": "∕", "\\": "⧵"}
    return "".join(bb if ch.isspace() else repl.get(ch, ch) for ch in s)[:64]
for msg in messages:
    if msg.get("name" or ""):
        msg["name"] = _sanitize_name(str(msg.get("name", "")))

# demonstrative deserialize: conversion to JSON string request body
import json
req_string = json.dumps(api_parameters, indent=2)

try:
    import os
    import httpx
    # Transmit REST using HTTP. OPENAI_API_KEY environment variable must be present.
    response = httpx.post(
        "https://api.openai.com/v1/chat/completions",
        headers={
            "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}",
            "Content-Type": "application/json",
        },
        content=req_string,
        # json=api_parameters,  # OR: this would send a prepared dict directly as JSON content type
        timeout=600,
    )
    response.raise_for_status()
except httpx.HTTPStatusError as e:
    request_id = response.headers.get("x-request-id")
    print(f"header x-request-id: {request_id}\nHTTP status {response.status_code} error")
    try:
        err_text = json.loads(response.text)['error']['message']
    except ValueError:
        err_text = response.text  # if parsing JSON of error fails
    while '****' in err_text: err_text = err_text.replace('****', '***')  # limit long runs of asterisks in error body
    print(f"message: {err_text}")
    raise
except httpx.RequestError as e:
    print(f"Request error: {e}")
    raise
else:
    # extract x headers for 'x-ratelimit-*' 'x-request-id' and 'x-envoy-upstream-service-time'
    headers_dict = {k: v for k, v in response.headers.items() if k.startswith("x-") and k != "x-content-type-options"}

    # At this point, `response` is a httpx.Response object with the HTTP JSON response. We'll show the body.
    print(f"Request:\n{req_string}\nRate Headers:\n{headers_dict}\nResponse:\n{response.text}")
    response_dict = response.json()  # API call result for use


# --- Additional message creation examples --- #

# IMAGES: -- Message examples for Chat Completions, demonstrating computer vision. -- 
# - Valid roles: "system"/"developer" | "user" | "assistant" | "tool"
# - Only the "user" role may include images in the request.

messages = [
    {
        "role": "system",  # or "developer" for reasoning models and gpt-5
        "content": "You are a helpful vision assistant.",
    },
    {
        "role": "user",
        # When you need images (or PDF "file"), use a list of content parts. Each part is either:
        #   - {"type": "text", "text": "..."}
        #   - {"type": "image_url", "image_url": {"url": "<http(s) or data: URI>", "detail": "low|high|auto"}}
        "content": [
            {
                "type": "text",
                "text": "For each image: describe it succinctly, then compare them briefly.",
            },

            # --- IMAGE, via API server retrieving a remote internet file ---
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/4/47/PNG_transparency_demonstration_1.png",
                    "detail": "low",  # faster/cheaper pass, a fixed cost
                },
            },

            # --- IMAGE, via a base64-encoded file placed directly in the request as data URI with MIME ---
            # {
            #     "type": "image_url",
            #     "image_url": {
            #         "url": "...",
            #         "detail": "high,  # multi-tile vision, bigger images cost more
            #     },
            # },
        ],
    },
]



# FILES: -- Messages example for Chat Completions, demonstrating PDF `file` content parts. --
# - PDFs are extracted text + page image, provided to the model in-context.
# - Only the "user" role may include `file` parts. 
# - Provide one content part per PDF.
# - Use exclusively one of `file_id` OR base64 `file_data` inside each `file` object.

file_messages = [
  {
    "role": "system",
    "content": "You are a meticulous research assistant."
  },
  {
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "Summarize the attached PDFs and list three key findings with page references."
      },

      # --- PDF via uploaded file_id (stored in OpenAI Files) ---
      {
        "type": "file",
        "file": {
          "file_id": "file-abc123def4567890"
        }
      },

      # --- PDF via inline base64 data (no prior upload) ---
      {
        "type": "file",
        "file": {
          "filename": "product-brochure.pdf",
          "file_data": "JVBERi0xLjQKJcTl8uXr..."
        }
      }
    ]
  }
]


'''
The Chat Completions return includes a `usage` object with this form:

Usage:
 {
  "completion_tokens": 729,  # total output billing
  "prompt_tokens": 29,  # total input billing
  "total_tokens": 758,
  "completion_tokens_details": {
    "accepted_prediction_tokens": 0,  # (instantly obsolete)
    "audio_tokens": 0,  # audio I/O portion billed at a different price
    "reasoning_tokens": 704,  # output billing part that was internal thinking
    "rejected_prediction_tokens": 0
  },
  "prompt_tokens_details": {
    "audio_tokens": 0,
    "cached_tokens": 0  # input billing part which was discounted by context k-v reuse
  }
}
'''

(The included prompt will bring about parallel tool output)

Let me know if you have useful inclusions that I didn’t hit on, or if this serves as a useful sidebar for you as a reference.

  • Audio modality I/O is not touched on because it basically needs its own teaching.
  • Async, SDK: show and yield the client method, if needed.
  • A complete consumable grid of model features? Yes, I’ve kind of got that, extracted from the Playground APIs. That is more app data than code.

Newest AI models generally know how to parse chat completions streaming and non-stream and trust you there (while Responses’ event output is almost unteachable cognitive load).

5 Likes

This is quite helpful @_j.

I see you mentioned that the API reference cannont be scraped or copied. How are you planning to ensure your API request body will be up to date?

Other than that, for parameters such as temperature, frequency_penalty, presence_penalty, would it be better to have the OpenAI defintions in the comments? For instance, we could say “What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.” for the temperature parameter?

Additionally, I see that you’ve mentioned “# Demote tokens proportional to prior frequency in the text (-1.0 to 1.0)” for the frequency_penalty. I believe that the range is -2.0 to 2.0. It might be helpful to confirm and list the ranges for all parameters where applicable.

Thanks!

2 Likes

Thanks for the catch, I updated the range. It really isn’t practical to use a penalty, so the smaller range by my memory is even at “output damaging” levels. It could break up repetitive predictions in completions, which isn’t a large concern now.

This is a work of deliberation and trials, and even highlights errors in the API reference. “Up to date” is when you see me edit in a different date with a change.


top_p, temperature is known by AI, and the description by OpenAI isn’t great. It doesn’t need elaboration, and AI can answer better than that. I’d write this refinement if you want guidance:

Sampling is a mechanism for randomly picking from generated token prediction certainties at each position during generation, in proportion to how likely they are to be “good”, instead of always choosing the top-ranked output. Why? Random alternate choices can make language flow a bit more naturally, break up patterns, but can also be application-tuned by parameters:

top_p: (nucleus sampling) - discards the tail of low-ranked logits by probability distribution cutoff. 0.80 → only the cumulative top 80 percent are kept. (1.00: no effect)
temperature: (rebiasing) - then redistributes remaining logit certainties, making top ranks more likely as the parameter value decreases. (1.00: original distribution)

2 Likes