Launching o3-mini in the API

Hi all, could you please help me with model name? I’m on tier 4 and can access o3-mini via API, but cannot access neither o3-mini-high nor o3-mini-2025-01-31-high. What is exact model name for high reasoning? Is it available only for Tier 5?

go to playground, select the model, then click on code…

there is a parameter “reasoning_effort”

1 Like

https://openai.com/careers/research-engineer/ << copy a job description

Paste it into the input field:

and get courses from Microsoft Learn - connected to faiss…

Build in ~1 hour.

Obviously the search strategy is messed up a little

Would love you to see…

Here’s o3-mini giving its medium effort to fixing the “using logprobs” cookbook, and its long non-functional #4 output highlighter languishing. (Necessary as completions playground was removed without equivalent demonstrative highlighter and logprob analysis).

I wrote a long .ipynb of my own to demonstrate the issue and actual API return in multiple calls, and then provided that file text as input. For:

…new AI models use new tokenizers that make the problem more apparent: Multi-byte UTF-8 characters can require multiple AI tokens to represent a single glyph, and one logprob chunk may have multiple tokens to form a character or word, but the code wasn’t written with this understanding, instead, making naive parsing assumptions and assumptions about chunks as a token-counting method.

(expand) Your O3-mini response

Below is one way to fix the “token highlighter” so that it works robustly no matter whether (or not) each token’s bytes cleanly map to complete UTF‑8 characters. First, note that the original code simply decoded each token’s bytes as if they were independently complete. In reality, a multi‑byte character may be “cut” across token boundaries. (Sometimes one token is a full emoji; sometimes an emoji, word, or even punctuation is split between tokens.) To robustly mark the boundaries in the final decoded output, you must:

  1. Reconstruct the complete output text by concatenating all of the token “bytes” (after converting the list of ints into a byte array) and decoding that once to get the final text.

  2. Keep track of each token’s contribution in terms of its byte length (i.e. its “span” in the aggregated bytes). Since the mapping from byte offsets to character positions isn’t one‐to‐one (because a character may be represented by several bytes) you compute a mapping from the aggregated text’s characters back to their positions in the byte string.

  3. For each token you know its start byte and end byte (cumulative over tokens). Then “map” that span into character indices. In the ideal case the token boundaries lie exactly on character boundaries, but if not, you can “snap” to the nearest complete character boundaries. (This might mean that the highlighted spans overlap a tiny bit—but it is far better than showing broken or “half‐decoded” glyphs.)

  4. Finally, use the computed character boundaries for each token to generate an HTML string with colored spans for each token.

The refactored solution below implements these ideas. (It uses the standard library’s bisect module so that, given the cumulative byte offsets for each character, we can find the “nearest” character boundaries to a given token’s byte‐offset.)

Below is the complete Python code for the robust logprobs token highlighter:


Refactored code

import bisect
import codecs
import numpy as np
from math import exp
from IPython.display import display, HTML

Assume that get_completion(...) is defined and returns an API response

similar to what was shown earlier.

(It must return an object with choices[0].logprobs.content, where each token object

has:

token: the token string (possibly partial)

logprob: its log probability

bytes: a list of integer byte values)

def highlight_tokens(api_response):
# Colors to rotate for tokens
colors = [
#FF00FF”, # Magenta
#008000”, # Green
#FF8C00”, # Dark Orange
#FF0000”, # Red
#0000FF”, # Blue
]

# Aggregate all token bytes into one byte array
aggregated_byte_list = []

# Also, keep token-level info (their byte lengths and other info)
token_info = []
for token in api_response.choices[0].logprobs.content:
    toks = token.bytes  # list of ints
    token_len = len(toks)
    aggregated_byte_list += toks
    token_info.append({
        "token_obj": token,
        "byte_length": token_len,
    })

aggregated_bytes = bytes(aggregated_byte_list)

# Decode the full aggregated bytes to obtain the complete text.
try:
    aggregated_text = aggregated_bytes.decode("utf-8")
except UnicodeDecodeError as e:
    # In principle this shouldn’t happen if the API returns valid text overall.
    aggregated_text = aggregated_bytes.decode("utf-8", errors="replace")
    print("Decoding error fixed using 'replace':", e)

# Build a mapping of character positions to byte offsets.
# offsets[i] will be the starting byte index of the i-th character in aggregated_text.
offsets = [0]
for ch in aggregated_text:
    offsets.append(offsets[-1] + len(ch.encode("utf-8")))
# offsets[-1] must equal len(aggregated_bytes)

# Now, iterate through tokens and determine the corresponding character slice.
html_output = ""
cumulative_byte = 0   # current byte offset in aggregated_bytes
color_idx = 0         # to choose colors in round-robin fashion

for info in token_info:
    token_byte_len = info["byte_length"]
    token_start_byte = cumulative_byte
    token_end_byte = cumulative_byte + token_byte_len
    
    # Find the character indices that cover this token.
    # Because token boundaries may not exactly equal a character boundary, we use bisect.
    char_start = bisect.bisect_left(offsets, token_start_byte)
    # If the token started in the middle of a character (offset not equal) then back up one.
    if char_start >= len(offsets) or offsets[char_start] != token_start_byte:
        char_start = max(char_start - 1, 0)
    
    char_end = bisect.bisect_left(offsets, token_end_byte)
    if char_end >= len(offsets) or offsets[char_end] != token_end_byte:
        # If token_end_byte isn’t a character start, then bump to the next character.
        char_end = min(char_end + 1, len(offsets)-1)
    
    token_text = aggregated_text[char_start:char_end]
    
    # Build an HTML span with this token’s text in the given color.
    html_output += f"<span style='color: {colors[color_idx]}'>{token_text}</span>"
    
    # Rotate to the next color.
    color_idx = (color_idx + 1) % len(colors)
    cumulative_byte += token_byte_len
    
# Display the highlighted text.
display(HTML(html_output))

# For demonstration purposes, print also the token count & reassembled text:
print(f"Total number of tokens: {len(token_info)}")
print("Reconstructed text:", aggregated_text)

def print_logprobs_details(api_response):
# This function prints each token along with its logprob, linear probability,
# and the bytes that form it.
joint_logprob = 0.0
aggregated_bytes =

for token in api_response.choices[0].logprobs.content:
    print("Token:", token.token)
    print("Log prob:", token.logprob)
    print("Linear prob:", np.round(exp(token.logprob) * 100, 2), "%")
    print("Bytes:", token.bytes, "\n")
    aggregated_bytes += token.bytes
    joint_logprob += token.logprob
    
aggregated_text = bytes(aggregated_bytes).decode("utf-8", errors="replace")
print("Bytes array:", aggregated_bytes)
print("Decoded bytes:", aggregated_text)
print("Joint prob:", np.round(exp(joint_logprob) * 100, 2), "%")

Example usage:

(Define your prompt and invoke the completion request as before.)

PROMPT = “Produce 10 of your favorite emoji. Just the emoji only.”
API_RESPONSE = get_completion(
[{“role”: “user”, “content”: PROMPT}],
model=“gpt-4o-2024-11-20”,
logprobs=True
)

Print token details with logprobs and bytes.

print_logprobs_details(API_RESPONSE)

Now call the robust highlighter which uses the complete byte–to–character mapping.

highlight_tokens(API_RESPONSE)


Explanation of the key changes:

• Instead of trying to decode each token’s bytes separately, we build the full byte string from all tokens and decode that once. This way we know we have valid UTF‑8 (or can use error handling if needed).

• We compute a cumulative “offset” list for the decoded text by re‑encoding each character. This lets us translate a byte offset (from the aggregated tokens) into the corresponding character index (or region). In effect, for each token we know its byte span (starting at cumulative_byte and extending for len(token.bytes) bytes), and we then map that span into the decoded text. If a token boundary falls in the middle of a multi‑byte character, we “snap” to the nearest character boundaries to avoid broken glyphs.

• We then generate an HTML string where each token’s corresponding text is wrapped in a with a designated color.

This approach is robust regardless of whether a token’s bytes line up exactly with complete UTF‑8 characters. (In some cases you might see a little overlap if token boundaries fall mid‑character—which is far better than a decode error, and it visually demonstrates how the underlying logprobs “chunks” are assembled.)

You can now use the functions print_logprobs_details(API_RESPONSE) and highlight_tokens(API_RESPONSE) to both print a detailed breakdown and display a highlighted version of the output text.

This solution is built for Python 3.11+ and should work in most Jupyter or notebook environments.

This is an example input where o1-preview and o1-mini can’t write code that executes any better than the original crashing. I don’t have the time to nickpick and merge and inspect diffs of what an AI now produced, and I will also return the favor of keeping my prompts to myself, but you can think about the cookbook documentation updates needed anyway.

0-effort self-analysis:

The code integrates all needed imports and functions – including a robust API caller, token logprob printer, and highlighter that correctly aggregates token bytes and maps them to full UTF-8 characters. While slight overlaps in highlighting may occur when token boundaries fall mid-character, this solution reliably demonstrates the tokenization details for API developers using the new tokenizer and logprobs responses.

Below is the final executable code in a single markdown-fenced block:

import bisect
import numpy as np
from math import exp
from IPython.display import display, HTML

# Example API client and get_completion function.
# In practice, ensure you have your OpenAI API client configured appropriately.
from openai import OpenAI
client = OpenAI()

def get_completion(
    messages: list[dict[str, str]],
    model: str = "gpt-4o-mini",
    max_tokens=50,
    temperature=0,
    stop=None,
    seed=123,
    tools=None,
    logprobs=None,
    top_logprobs=None,
) -> object:
    """
    Makes an API request and returns the completion response.
    The response is expected to have a structure with:
      response.choices[0].logprobs.content
    where each token object has:
      - token (possibly partial)
      - logprob
      - bytes (a list of integer values representing its UTF-8 bytes)
    """
    params = {
        "model": model,
        "messages": messages,
        "max_tokens": max_tokens,
        "temperature": temperature,
        "stop": stop,
        "seed": seed,
        "logprobs": logprobs,
        "top_logprobs": top_logprobs,
    }
    if tools:
        params["tools"] = tools
    return client.chat.completions.create(**params)

def highlight_tokens(api_response):
    """
    Highlights tokens by reconstructing the full text from aggregated bytes,
    mapping each token's byte span to character boundaries, and then wrapping
    the resulting substring in an HTML span with a color. Multiple tokens that
    do not fall on perfect character boundaries are snapped to the nearest
    full character.
    """
    colors = [
        "#FF00FF",  # Magenta
        "#008000",  # Green
        "#FF8C00",  # Dark Orange
        "#FF0000",  # Red
        "#0000FF",  # Blue
    ]
    
    # Build aggregated byte list and store each token's byte length.
    aggregated_byte_list = []
    token_info = []
    for token in api_response.choices[0].logprobs.content:
        toks = token.bytes  # list of ints
        token_len = len(toks)
        aggregated_byte_list += toks
        token_info.append({
            "token_obj": token,
            "byte_length": token_len,
        })
    
    aggregated_bytes = bytes(aggregated_byte_list)
    
    # Decode the aggregated bytes into full text.
    try:
        aggregated_text = aggregated_bytes.decode("utf-8")
    except UnicodeDecodeError as e:
        aggregated_text = aggregated_bytes.decode("utf-8", errors="replace")
        print("Decoding error fixed using 'replace':", e)
    
    # Build cumulative mapping: offsets[i] is the start byte offset of the i-th character.
    offsets = [0]
    for ch in aggregated_text:
        offsets.append(offsets[-1] + len(ch.encode("utf-8")))
    # offsets[-1] should equal len(aggregated_bytes)
    
    # Create HTML with colored spans for each token.
    html_output = ""
    cumulative_byte = 0  # current byte offset in aggregated_bytes
    color_idx = 0
    for info in token_info:
        token_byte_len = info["byte_length"]
        token_start_byte = cumulative_byte
        token_end_byte = cumulative_byte + token_byte_len
        
        # Map token's byte span to character indices using bisect.
        char_start = bisect.bisect_left(offsets, token_start_byte)
        if char_start >= len(offsets) or offsets[char_start] != token_start_byte:
            char_start = max(char_start - 1, 0)
        
        char_end = bisect.bisect_left(offsets, token_end_byte)
        if char_end >= len(offsets) or offsets[char_end] != token_end_byte:
            char_end = min(char_end + 1, len(offsets)-1)
        
        token_text = aggregated_text[char_start:char_end]
        html_output += f"<span style='color: {colors[color_idx]}'>{token_text}</span>"
        color_idx = (color_idx + 1) % len(colors)
        cumulative_byte += token_byte_len
    
    display(HTML(html_output))
    print(f"Total number of tokens: {len(token_info)}")
    print("Reconstructed text:", aggregated_text)

def print_logprobs_details(api_response):
    """
    Prints each token's text, log probability, linear probability, and byte values.
    Also aggregates the full decoded text and shows the joint probability.
    """
    joint_logprob = 0.0
    aggregated_bytes = []
    for token in api_response.choices[0].logprobs.content:
        print("Token:", token.token)
        print("Log prob:", token.logprob)
        print("Linear prob:", np.round(exp(token.logprob) * 100, 2), "%")
        print("Bytes:", token.bytes, "\n")
        aggregated_bytes += token.bytes
        joint_logprob += token.logprob
        
    aggregated_text = bytes(aggregated_bytes).decode("utf-8", errors="replace")
    print("Bytes array:", aggregated_bytes)
    print("Decoded bytes:", aggregated_text)
    print("Joint prob:", np.round(exp(joint_logprob) * 100, 2), "%")

# Example usage:
PROMPT = "Produce 10 of your favorite emoji. Just the emoji only."
API_RESPONSE = get_completion(
    [{"role": "user", "content": PROMPT}],
    model="gpt-4o-2024-11-20",
    logprobs=True,
    top_logprobs=3  # or any value you wish to inspect
)

print_logprobs_details(API_RESPONSE)
highlight_tokens(API_RESPONSE)

So, obviously we’ve an operational issue here. Multiple users already reported that in the playground they can access o3-mini, but via the API, they would get a 404. I can confirm the same thing as a tier 3 member.

My implementation is correct – it works perfectly with o1-mini and definitely sets the parameters correctly.

import { getEnv } from '@core/env';
import OpenAI, { AzureOpenAI } from "openai"

export interface OpenAIOptions {
  model: string;
  prompt: string;
  provider?: 'azure' | 'openai-compatible';
  deployment?: string;
  apiVersion?: string; // 2024-05-01-preview
}

export async function openai({
  model,
  prompt,
  provider,
  deployment,
  apiVersion
}: OpenAIOptions) {
  const apiKey = getEnv('OPENAI_API_KEY')
  const endpoint = getEnv('OPENAI_BASE_URL');

  const client = provider === 'azure' ?
    new AzureOpenAI({ endpoint, apiKey, apiVersion, deployment, baseURL: null }) :
    new OpenAI({ apiKey, baseURL: endpoint });

  const config: OpenAI.Chat.Completions.ChatCompletionCreateParamsNonStreaming = {
    model,
    messages: [
      {
        role: model.indexOf("o1") > -1 ? "user" : "developer",
        content: [{
          type: "text",
          text: prompt
        }]
      },
    ],
    response_format: {
      "type": "text"
    },
    stop: null
  }

  if (model.indexOf("o3") > -1) {
    config.reasoning_effort = 'high';
  }

  const result = await client.chat.completions.create(config);
  return result.choices[0].message.content;
}

// usage for o3
const response = await openai({
      model: 'o3-mini',
  provider: 'openai-compatible',
  prompt: 'You are a helpful assistant. Who was first? The chicken or the egg?',
});

Getting teased in the playground, that you could use o3-mini, only to receive Error: 404 The model o3-mini does not exist or you do not have access to it. isn’t exactly a fantastic developer experience… xD

@nikunj Would be really cool if your team could address this issue please. Thank you! :slight_smile:

It looks like o3-mini-high is really good at prompt-programs. In the code below, I’ve included a “memory-gram” prompt-program. When fed into o3-mini-high as a prompt, it adds a save feature. Saying “SAVE” automatically creates a memory-gram, which when fed into other instances of o3-mini-high reloads the conversation, and also adds the SAVE feature. It looks like o3-mini-high is the only model capable of handling this complex of a prompt-program.

Fun enough, o3-mini-high designed the memory-gram prompt-program. I just fed in prototype memory-grams into o3-mini-high and then gave critical feedback for what needed to change, ad nauseam. This should make getting o3-mini-high to work well with code a lot easier, since it should be possible to have prompt-programs that add “API functionality.”

{
  "memoryGram": {
    "concept": "A meta-program that loads persistent context management features at the start of a conversation, automatically deploying functionality without requiring additional confirmation.",
    "features": {
      "state_saving": {
        "command": "SAVE",
        "description": "Automatically capture the current conversation state into a memory gram for later retrieval when this command is issued."
      },
      "state_retrieval": {
        "command": "MEMORY",
        "description": "Automatically retrieve a summarized memory gram of the current conversation when this command or a natural language equivalent is used."
      },
      "pruning": {
        "description": "Automatically prune excess information when the memory gram exceeds a defined maximum size.",
        "max_size": "Defined limit (e.g., in tokens or lines)",
        "strategy": "Remove the oldest or least critical data while retaining essential meta-information and instructions."
      }
    },
    "instructions": "Load this meta-program at the beginning of your conversation to enable memory gram capabilities. Once loaded, the system deploys its functionality immediately—commands like 'SAVE' and 'MEMORY' are active without any extra confirmations. The system also handles pruning automatically to maintain an optimal memory size.",
    "accessibility": {
      "keywords": ["SAVE", "MEMORY"],
      "notes": "These keywords are recognized automatically and trigger the respective memory gram actions without further confirmation."
    },
    "version": "1.1",
    "notes": "This meta-program is designed to seamlessly integrate into any conversation. Once you load it, it assumes full control of memory gram functionality without additional prompts or confirmations."
  }
}
1 Like

Options for low or high in the API? How to set? (Loving the medium, BTW!)

Answer - in case anyone else scrolls too fast:

  • Field reasoning_effort in the request.
  • Field usage.completion_tokens_details.reasoning_tokens in the response.

Just make sure you also update your openai library! :slight_smile:

Where do I enable the o3 model in playgrounds? I can use o3 model just fine in chatgpt (Plus account) However I don’t see it in my dashboard in the playground or any ability to use it as a model in my API.

Where do I enable it?

I don’t even see that model available in my playground? Any ideas why?

please see Usage Tier 3 access denied to o3-mini - #42 by edwinarbus

3 Likes

Tier 3 guy here, no access to it till date. Legitimate user, relying it on work deliverables. Had been applying multiple times since form opening in 2024. Spent hundreds of dollars on token so far. Got left out. Disappointing.

3 Likes

Same. Only can use it on ChatGPT unfortunately. They make formal announcement like it is launched for all, but truth is its opened only to a small subset of selected people. And again the selection criteria wasn’t revealed. OpenAI is really full of darkness. Hope o3-mini gets beaten instantly at all benchmark.

Yes, you are tier-2, right? You need to be at least tier-3. But you might still be shadow-banned from API use, like me. Also if you are a company, you are also not getting the model in playground (not even). It’s really frustrating.

2 Likes

I am a paying user and have o3-mini and o3-mini-high in the front end (ChatGPT), yet in the API call I get the response that this model is not available (and the list I get as a return for all the available models does not contain anything o3).

1 Like

You’re facing the exact issue that I do too. You probably also want to take a look at the thread @Foxalabs linked

1 Like

When can we expect support for the reasoning_effort parameter in the Assistant API? This would enable the o* models to function with the Assistant.

It seems like the O3-mini API responses don’t wrap code snippets into triple backtick format. I have to manually write another prompt to format the code.

1 Like

Probably related to this

4 Likes

:thinking:hmm… i wonder what other hidden parameters they’ve got for this first line of the dev message. :nerd_face:

1 Like

is the API for o3-mini fully rolled out? I’m in Tier 5 but still don’t see the model avaialble in the list?