Retrieve response time via API

Hi,

After a chat completion, it provides response time and input and output tokens. Is there a way to retrieve response value via API? Haven’t found a way in the docs.
Thanks!

For a Chat Completions call, for an API developer, the response is ephemeral, only delivered once.

If you set the API parameter "store": true, and also enable that logging in the platform site, then you can see the chat completions calls in the dashboard logs in that user interface, but you aren’t given the same API access to recall them again yourself.

They remain for 30 days and you have no delete ability.


The Responses endpoint with store offers an endpoint method to retrieve the stored input or output again by API call, by ID. The retention period may be quite long. There is no list method.


If you like β€œresponse time”, you might like the durations you can also gather from headers that are returned:

[
  "openai-processing-ms",
  "340"
 ],
 [
  "x-envoy-upstream-service-time",
  "344"
 ],
 [
  "x-ratelimit-limit-requests",
  "30000"
 ],
...
2 Likes

Hi!
I just came across this topic and it is possible to retrieve stored chat.completions via the API. This may have been fixed or changed since May, but it is still useful to note.
Here is a very basic script that demonstrates how to do this:

import os
import sys

from openai import OpenAI


def main() -> int:
    if len(sys.argv) < 2:
        print("Usage: python retrieve_chat_completion.py <chat_completion_id>")
        return 2

    api_key = os.environ.get("OPENAI_API_KEY")
    if not api_key:
        print("OPENAI_API_KEY is not set")
        return 1

    completion_id = sys.argv[1]
    client = OpenAI(api_key=api_key)

    completion = client.chat.completions.retrieve(completion_id)
    out_path = f"chat_completion_{completion_id}.json"
    with open(out_path, "w", encoding="utf-8") as f:
        f.write(completion.model_dump_json(indent=2))

    print(out_path)
    return 0


if __name__ == "__main__":
    raise SystemExit(main())
3 Likes

You are technically correct - the best kind of correct

with the introduction of β€œstore”, retrieve() was made available in the SDK Feb 13

get https://api.openai.com/v1/chat/completions/{completion_id}

However, not available is a list method on the API. You had one chance to get both the created time this topic asks for and the id you must reuse to retrieve the response object again: an initial, successful call response.

{
  "object": "chat.completion",
  "id": "chatcmpl-Cgm4toeZkeYFdKPJZz3tC2ql0xF4G",
  "created": 1768472231,...

So β€œhave id, don’t have created” is going to be a rare need to be fulfilled by the endpoint, as the OP question is likely more about duration or requires what was already provided once.


Phun code to see how long you have to wait between call and get (or poll and get some more) -

import openai, time
body={"model":"gpt-4.1-nano", "store":True,
  "messages":[{"role":"user","content":"ping! (say pong)"}]
}
id="chatcmpl-Cgm4toeZkeYFdKPJZz3tC2ql0xF4G"  # your id from the past
id=openai.chat.completions.create(**body).id   # or make a call
print(f"waiting for {id}.."); time.sleep(15)  # availibility is this slow...
print(f"created: {openai.chat.completions.retrieve(id).created}")
2 Likes

List Chat Completions - API

More than the API reference:

GET https://api.openai.com/v1/chat/completions

Query parameters:

  model: string  # optional [filter: model used to generate the chat completions]
  metadata: object | null  # optional [filter by metadata; sent as metadata[key]=value query params]
  β”œβ–Έ (object): map<string,string>  # [up to 16 key/value pairs]
  β””β–Έ (null): null
  after: string  # optional [pagination cursor: last id from previous page]
  limit: integer  # optional (default: 20) [page size] (non-functional)
  order: "asc" | "desc"  # optional (default: asc) [sort by timestamp]

(Cursor pagination from response required after limit cutoff to get more)

Response

  • only the API’s response and some echoed parameters, not prompt messages

Content-Type: application/json

  object: "list"  # required
  data: ChatCompletion[]  # required
  first_id: string  # required
  last_id: string  # required
  has_more: boolean  # required

  data[].(each ChatCompletion): object
  β”œβ–Έ id: string  # required
  β”œβ–Έ object: "chat.completion"  # required
  β”œβ–Έ created: integer  # required [unix seconds]
  β”œβ–Έ model: string  # required
  β”œβ–Έ choices: Choice[]  # required
  β”œβ–Έ usage: CompletionUsage  # optional
  β”œβ–Έ service_tier: "auto" | "default" | "flex" | "scale" | "priority" | null  # optional
  β””β–Έ system_fingerprint: string  # optional (deprecated)

  choices[].(each Choice): object
  β”œβ–Έ index: integer  # required
  β”œβ–Έ finish_reason: "stop" | "length" | "tool_calls" | "content_filter" | "function_call"  # required
  β”œβ–Έ message: ChatCompletionMessage  # required
  β””β–Έ logprobs: Logprobs | null  # required

(ChatCompletionMessage and Logprobs are the POST /chat/completions response shapes.)


Since you know how to β€œGET {url}” … have a bit of Python of code, retrieving the times of recent calls, with a taste of in-code API documentation.

import os
import time
import asyncio
from datetime import datetime
from typing import Any, Literal, TypedDict

import aiohttp

OPENAI_BASE_URL = "https://api.openai.com/v1"


# Typed API shapes (request params + response)

Order = Literal["asc", "desc"]
ServiceTier = Literal["auto", "default", "flex", "scale", "priority"]
FinishReason = Literal[
    "stop", "length", "tool_calls", "content_filter", "function_call"
]


class ListChatCompletionsParams(TypedDict, total=False):
    model: str | None
    metadata: dict[str, str] | None
    after: str | None
    limit: int | None
    order: Order | None


class ChatCompletionResponseMessageFunctionCall(TypedDict):
    name: str
    arguments: str  # JSON string (model-generated)


class ChatCompletionMessageToolCallFunction(TypedDict):
    name: str
    arguments: str  # JSON string (model-generated)


class ChatCompletionMessageToolCall(TypedDict):
    id: str
    type: Literal["function"]
    function: ChatCompletionMessageToolCallFunction


class ChatCompletionMessageCustomToolCallCustom(TypedDict):
    name: str
    input: str


class ChatCompletionMessageCustomToolCall(TypedDict):
    id: str
    type: Literal["custom"]
    custom: ChatCompletionMessageCustomToolCallCustom


ChatCompletionMessageToolCalls = list[
    ChatCompletionMessageToolCall | ChatCompletionMessageCustomToolCall
]


class UrlCitation(TypedDict):
    start_index: int
    end_index: int
    url: str
    title: str


class UrlCitationAnnotation(TypedDict):
    type: Literal["url_citation"]
    url_citation: UrlCitation


class ChatCompletionResponseMessage(TypedDict, total=False):
    role: Literal["assistant"]
    content: str | None
    refusal: str | None

    tool_calls: ChatCompletionMessageToolCalls | None

    # optional annotation list
    annotations: list[UrlCitationAnnotation]

    # deprecated
    function_call: ChatCompletionResponseMessageFunctionCall | None

    # optional; null unless you requested audio output modality
    audio: dict[str, Any] | None


class TokenTopLogprob(TypedDict):
    token: str
    logprob: float
    bytes: list[int] | None


class ChatCompletionTokenLogprob(TypedDict):
    token: str
    logprob: float
    bytes: list[int] | None
    top_logprobs: list[TokenTopLogprob]


class ChoiceLogprobs(TypedDict, total=False):
    content: list[ChatCompletionTokenLogprob] | None
    refusal: list[ChatCompletionTokenLogprob] | None


class ChatCompletionChoice(TypedDict):
    finish_reason: FinishReason
    index: int
    message: ChatCompletionResponseMessage
    logprobs: ChoiceLogprobs | None


class CompletionTokensDetails(TypedDict, total=False):
    accepted_prediction_tokens: int
    rejected_prediction_tokens: int
    reasoning_tokens: int
    audio_tokens: int


class PromptTokensDetails(TypedDict, total=False):
    cached_tokens: int
    audio_tokens: int


class CompletionUsage(TypedDict, total=False):
    # Always present in your payload (and required in the OpenAPI excerpt)
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int

    # Sometimes present (often absent in list results)
    completion_tokens_details: CompletionTokensDetails
    prompt_tokens_details: PromptTokensDetails


class ChatCompletion(TypedDict, total=False):
    id: str
    object: Literal["chat.completion"]
    created: int
    model: str
    choices: list[ChatCompletionChoice]

    # observed in your payload
    request_id: str
    tool_choice: Any | None
    seed: int | None
    top_p: float | None
    temperature: float | None
    presence_penalty: float | None
    frequency_penalty: float | None
    input_user: str | None
    tools: Any | None
    metadata: dict[str, str] | None
    response_format: Any | None

    service_tier: ServiceTier | None
    system_fingerprint: str | None
    usage: CompletionUsage


class ChatCompletionList(TypedDict):
    object: Literal["list"]
    data: list[ChatCompletion]
    first_id: str
    last_id: str
    has_more: bool


# ───────────────────────
# Transport helpers


def _make_openai_headers(api_key: str) -> dict[str, str]:
    return {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
    }


async def _raise_for_status(resp: aiohttp.ClientResponse) -> None:
    if 200 <= resp.status < 300:
        return
    text = await resp.text()
    raise RuntimeError(f"OpenAI API error {resp.status}: {text}")


def _flatten_list_params(params: ListChatCompletionsParams) -> dict[str, str]:
    """
    Convert TypedDict-style params into query params:
      - omit None
      - expand metadata dict into metadata[key]=value
    """
    out: dict[str, str] = {}

    model = params.get("model")
    if isinstance(model, str) and model:
        out["model"] = model

    after = params.get("after")
    if isinstance(after, str) and after:
        out["after"] = after

    limit = params.get("limit")
    if isinstance(limit, int):
        out["limit"] = str(limit)

    order = params.get("order")
    if order in ("asc", "desc"):
        out["order"] = order

    metadata = params.get("metadata")
    if isinstance(metadata, dict):
        for k, v in metadata.items():
            if v is None:
                continue
            out[f"metadata[{k}]"] = str(v)

    return out


class OpenAITransport:
    def __init__(
        self, *, base_url: str = OPENAI_BASE_URL, api_key: str | None = None
    ) -> None:
        key = api_key or os.environ.get("OPENAI_API_KEY") or ""
        if not key:
            raise RuntimeError("OPENAI_API_KEY is not set.")

        self.base_url = base_url
        self._headers = _make_openai_headers(key)
        self._session: aiohttp.ClientSession | None = None

    async def __aenter__(self) -> "OpenAITransport":
        timeout = aiohttp.ClientTimeout(total=60)
        self._session = aiohttp.ClientSession(headers=self._headers, timeout=timeout)
        return self

    async def __aexit__(self, exc_type, exc, tb) -> None:
        if self._session is not None and not self._session.closed:
            await self._session.close()

    @property
    def session(self) -> aiohttp.ClientSession:
        if self._session is None or self._session.closed:
            raise RuntimeError("OpenAITransport is not open.")
        return self._session

    async def get_json(
        self, path: str, *, params: dict[str, str] | None = None
    ) -> dict[str, Any]:
        url = f"{self.base_url.rstrip('/')}/{path.lstrip('/')}"
        async with self.session.get(url, params=params) as resp:
            await _raise_for_status(resp)
            return await resp.json(content_type=None)


# ─────────────────────────────────────────────────────────────────────────────
# List stored chat completions for the past x days


def _copy_params(params: ListChatCompletionsParams) -> ListChatCompletionsParams:
    """
    Typed copy of a TypedDict.

    dict(params) becomes dict[str, object] to Pylance, so we explicitly rebuild it.
    """
    out: ListChatCompletionsParams = {}
    if "model" in params:
        out["model"] = params["model"]
    if "metadata" in params:
        out["metadata"] = params["metadata"]
    if "after" in params:
        out["after"] = params["after"]
    if "limit" in params:
        out["limit"] = params["limit"]
    if "order" in params:
        out["order"] = params["order"]
    return out


async def list_recent_stored_chat_completions(
    openai: OpenAITransport,
    *,
    days: int = 30,
    params: ListChatCompletionsParams,
) -> ChatCompletionList:
    now = int(time.time())
    cutoff = now - days * 24 * 60 * 60

    # If caller didn't specify an order, use desc for efficient "recent first" paging.
    effective_params: ListChatCompletionsParams = _copy_params(params)
    if effective_params.get("order") is None:
        effective_params["order"] = "desc"

    collected: list[ChatCompletion] = []
    after: str | None = effective_params.get("after")

    while True:
        effective_params["after"] = after
        qp = _flatten_list_params(effective_params)

        payload = await openai.get_json("/chat/completions", params=qp)

        data_any = payload.get("data", [])
        if not isinstance(data_any, list) or not data_any:
            return {
                "object": "list",
                "data": collected,
                "first_id": str(payload.get("first_id", "")),
                "last_id": str(payload.get("last_id", "")),
                "has_more": False,
            }

        for item_any in data_any:
            if not isinstance(item_any, dict):
                continue

            # Narrow item_any to a dict[str, Any] for cleaner .get typing
            item: dict[str, Any] = item_any

            created = item.get("created")
            if isinstance(created, int) and created < cutoff:
                return {
                    "object": "list",
                    "data": collected,
                    "first_id": (
                        str(collected[0].get("id", ""))
                        if collected
                        else str(payload.get("first_id", ""))
                    ),
                    "last_id": (
                        str(collected[-1].get("id", ""))
                        if collected
                        else str(payload.get("last_id", ""))
                    ),
                    "has_more": False,
                }

            # Tell the type checker this dict structurally matches ChatCompletion.
            # This does not change runtime behavior.
            collected_item: ChatCompletion = item  # type: ignore[assignment]
            collected.append(collected_item)

        has_more = bool(payload.get("has_more", False))
        if not has_more:
            return {
                "object": "list",
                "data": collected,
                "first_id": (
                    str(collected[0].get("id", ""))
                    if collected
                    else str(payload.get("first_id", ""))
                ),
                "last_id": (
                    str(collected[-1].get("id", ""))
                    if collected
                    else str(payload.get("last_id", ""))
                ),
                "has_more": False,
            }

        after_val = payload.get("last_id")
        after = after_val if isinstance(after_val, str) and after_val else None
        if after is None:
            return {
                "object": "list",
                "data": collected,
                "first_id": (
                    str(collected[0].get("id", ""))
                    if collected
                    else str(payload.get("first_id", ""))
                ),
                "last_id": (
                    str(collected[-1].get("id", ""))
                    if collected
                    else str(payload.get("last_id", ""))
                ),
                "has_more": False,
            }


async def main() -> None:
    params: ListChatCompletionsParams = {
        "model": None,  # filter
        "metadata": None,  # filter
        "after": None,
        "limit": 10,  # doesn't work
        "order": "desc",  # works ["asc" | "desc"]
    }

    days = 30
    async with OpenAITransport() as openai:
        result = await list_recent_stored_chat_completions(
            openai, days=days, params=params
        )

    for item in result.get("data", []):
        if not isinstance(item, dict):
            continue
        cc_id = item.get("id")
        request_id = item.get("request_id")
        created = item.get("created")
        usage = item.get("usage") or {}
        total_tokens = usage.get("total_tokens")
        model = item.get("model")

        if isinstance(created, int):
            dt = datetime.fromtimestamp(created)
            human = dt.strftime("%Y-%m-%d %H:%M:%S")
        else:
            human = "unknown-time"

        tok = total_tokens if isinstance(total_tokens, int) else "?"
        print(f"{human} {request_id}\n{cc_id} -{model}, {tok} tokens\n", "-" * 20)


if __name__ == "__main__":
    asyncio.run(main())

printing:

2026-01-15 03:17:18 req_ccfa18cf0d4d7d9af25bb3402d4ac5f8
chatcmpl-CyWdR4GCWUFdmPWT85fpwP -gpt-4.1-nano-2025-04-14, 15 tokens
 --------------------
2026-01-15 03:10:44 ...