Hi,
After a chat completion, it provides response time and input and output tokens. Is there a way to retrieve response value via API? Havenβt found a way in the docs.
Thanks!
Hi,
After a chat completion, it provides response time and input and output tokens. Is there a way to retrieve response value via API? Havenβt found a way in the docs.
Thanks!
For a Chat Completions call, for an API developer, the response is ephemeral, only delivered once.
If you set the API parameter "store": true, and also enable that logging in the platform site, then you can see the chat completions calls in the dashboard logs in that user interface, but you arenβt given the same API access to recall them again yourself.
They remain for 30 days and you have no delete ability.
The Responses endpoint with store offers an endpoint method to retrieve the stored input or output again by API call, by ID. The retention period may be quite long. There is no list method.
If you like βresponse timeβ, you might like the durations you can also gather from headers that are returned:
[
"openai-processing-ms",
"340"
],
[
"x-envoy-upstream-service-time",
"344"
],
[
"x-ratelimit-limit-requests",
"30000"
],
...
Hi!
I just came across this topic and it is possible to retrieve stored chat.completions via the API. This may have been fixed or changed since May, but it is still useful to note.
Here is a very basic script that demonstrates how to do this:
import os
import sys
from openai import OpenAI
def main() -> int:
if len(sys.argv) < 2:
print("Usage: python retrieve_chat_completion.py <chat_completion_id>")
return 2
api_key = os.environ.get("OPENAI_API_KEY")
if not api_key:
print("OPENAI_API_KEY is not set")
return 1
completion_id = sys.argv[1]
client = OpenAI(api_key=api_key)
completion = client.chat.completions.retrieve(completion_id)
out_path = f"chat_completion_{completion_id}.json"
with open(out_path, "w", encoding="utf-8") as f:
f.write(completion.model_dump_json(indent=2))
print(out_path)
return 0
if __name__ == "__main__":
raise SystemExit(main())
You are technically correct - the best kind of correct
with the introduction of βstoreβ, retrieve() was made available in the SDK Feb 13
get https://api.openai.com/v1/chat/completions/{completion_id}
However, not available is a list method on the API. You had one chance to get both the created time this topic asks for and the id you must reuse to retrieve the response object again: an initial, successful call response.
{
"object": "chat.completion",
"id": "chatcmpl-Cgm4toeZkeYFdKPJZz3tC2ql0xF4G",
"created": 1768472231,...
So βhave id, donβt have createdβ is going to be a rare need to be fulfilled by the endpoint, as the OP question is likely more about duration or requires what was already provided once.
Phun code to see how long you have to wait between call and get (or poll and get some more) -
import openai, time
body={"model":"gpt-4.1-nano", "store":True,
"messages":[{"role":"user","content":"ping! (say pong)"}]
}
id="chatcmpl-Cgm4toeZkeYFdKPJZz3tC2ql0xF4G" # your id from the past
id=openai.chat.completions.create(**body).id # or make a call
print(f"waiting for {id}.."); time.sleep(15) # availibility is this slow...
print(f"created: {openai.chat.completions.retrieve(id).created}")
More than the API reference:
GET https://api.openai.com/v1/chat/completions
Query parameters:
model: string # optional [filter: model used to generate the chat completions]
metadata: object | null # optional [filter by metadata; sent as metadata[key]=value query params]
ββΈ (object): map<string,string> # [up to 16 key/value pairs]
ββΈ (null): null
after: string # optional [pagination cursor: last id from previous page]
limit: integer # optional (default: 20) [page size] (non-functional)
order: "asc" | "desc" # optional (default: asc) [sort by timestamp]
(Cursor pagination from response required after limit cutoff to get more)
Content-Type: application/json
object: "list" # required
data: ChatCompletion[] # required
first_id: string # required
last_id: string # required
has_more: boolean # required
data[].(each ChatCompletion): object
ββΈ id: string # required
ββΈ object: "chat.completion" # required
ββΈ created: integer # required [unix seconds]
ββΈ model: string # required
ββΈ choices: Choice[] # required
ββΈ usage: CompletionUsage # optional
ββΈ service_tier: "auto" | "default" | "flex" | "scale" | "priority" | null # optional
ββΈ system_fingerprint: string # optional (deprecated)
choices[].(each Choice): object
ββΈ index: integer # required
ββΈ finish_reason: "stop" | "length" | "tool_calls" | "content_filter" | "function_call" # required
ββΈ message: ChatCompletionMessage # required
ββΈ logprobs: Logprobs | null # required
(ChatCompletionMessage and Logprobs are the POST /chat/completions response shapes.)
Since you know how to βGET {url}β β¦ have a bit of Python of code, retrieving the times of recent calls, with a taste of in-code API documentation.
import os
import time
import asyncio
from datetime import datetime
from typing import Any, Literal, TypedDict
import aiohttp
OPENAI_BASE_URL = "https://api.openai.com/v1"
# Typed API shapes (request params + response)
Order = Literal["asc", "desc"]
ServiceTier = Literal["auto", "default", "flex", "scale", "priority"]
FinishReason = Literal[
"stop", "length", "tool_calls", "content_filter", "function_call"
]
class ListChatCompletionsParams(TypedDict, total=False):
model: str | None
metadata: dict[str, str] | None
after: str | None
limit: int | None
order: Order | None
class ChatCompletionResponseMessageFunctionCall(TypedDict):
name: str
arguments: str # JSON string (model-generated)
class ChatCompletionMessageToolCallFunction(TypedDict):
name: str
arguments: str # JSON string (model-generated)
class ChatCompletionMessageToolCall(TypedDict):
id: str
type: Literal["function"]
function: ChatCompletionMessageToolCallFunction
class ChatCompletionMessageCustomToolCallCustom(TypedDict):
name: str
input: str
class ChatCompletionMessageCustomToolCall(TypedDict):
id: str
type: Literal["custom"]
custom: ChatCompletionMessageCustomToolCallCustom
ChatCompletionMessageToolCalls = list[
ChatCompletionMessageToolCall | ChatCompletionMessageCustomToolCall
]
class UrlCitation(TypedDict):
start_index: int
end_index: int
url: str
title: str
class UrlCitationAnnotation(TypedDict):
type: Literal["url_citation"]
url_citation: UrlCitation
class ChatCompletionResponseMessage(TypedDict, total=False):
role: Literal["assistant"]
content: str | None
refusal: str | None
tool_calls: ChatCompletionMessageToolCalls | None
# optional annotation list
annotations: list[UrlCitationAnnotation]
# deprecated
function_call: ChatCompletionResponseMessageFunctionCall | None
# optional; null unless you requested audio output modality
audio: dict[str, Any] | None
class TokenTopLogprob(TypedDict):
token: str
logprob: float
bytes: list[int] | None
class ChatCompletionTokenLogprob(TypedDict):
token: str
logprob: float
bytes: list[int] | None
top_logprobs: list[TokenTopLogprob]
class ChoiceLogprobs(TypedDict, total=False):
content: list[ChatCompletionTokenLogprob] | None
refusal: list[ChatCompletionTokenLogprob] | None
class ChatCompletionChoice(TypedDict):
finish_reason: FinishReason
index: int
message: ChatCompletionResponseMessage
logprobs: ChoiceLogprobs | None
class CompletionTokensDetails(TypedDict, total=False):
accepted_prediction_tokens: int
rejected_prediction_tokens: int
reasoning_tokens: int
audio_tokens: int
class PromptTokensDetails(TypedDict, total=False):
cached_tokens: int
audio_tokens: int
class CompletionUsage(TypedDict, total=False):
# Always present in your payload (and required in the OpenAPI excerpt)
prompt_tokens: int
completion_tokens: int
total_tokens: int
# Sometimes present (often absent in list results)
completion_tokens_details: CompletionTokensDetails
prompt_tokens_details: PromptTokensDetails
class ChatCompletion(TypedDict, total=False):
id: str
object: Literal["chat.completion"]
created: int
model: str
choices: list[ChatCompletionChoice]
# observed in your payload
request_id: str
tool_choice: Any | None
seed: int | None
top_p: float | None
temperature: float | None
presence_penalty: float | None
frequency_penalty: float | None
input_user: str | None
tools: Any | None
metadata: dict[str, str] | None
response_format: Any | None
service_tier: ServiceTier | None
system_fingerprint: str | None
usage: CompletionUsage
class ChatCompletionList(TypedDict):
object: Literal["list"]
data: list[ChatCompletion]
first_id: str
last_id: str
has_more: bool
# βββββββββββββββββββββββ
# Transport helpers
def _make_openai_headers(api_key: str) -> dict[str, str]:
return {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
}
async def _raise_for_status(resp: aiohttp.ClientResponse) -> None:
if 200 <= resp.status < 300:
return
text = await resp.text()
raise RuntimeError(f"OpenAI API error {resp.status}: {text}")
def _flatten_list_params(params: ListChatCompletionsParams) -> dict[str, str]:
"""
Convert TypedDict-style params into query params:
- omit None
- expand metadata dict into metadata[key]=value
"""
out: dict[str, str] = {}
model = params.get("model")
if isinstance(model, str) and model:
out["model"] = model
after = params.get("after")
if isinstance(after, str) and after:
out["after"] = after
limit = params.get("limit")
if isinstance(limit, int):
out["limit"] = str(limit)
order = params.get("order")
if order in ("asc", "desc"):
out["order"] = order
metadata = params.get("metadata")
if isinstance(metadata, dict):
for k, v in metadata.items():
if v is None:
continue
out[f"metadata[{k}]"] = str(v)
return out
class OpenAITransport:
def __init__(
self, *, base_url: str = OPENAI_BASE_URL, api_key: str | None = None
) -> None:
key = api_key or os.environ.get("OPENAI_API_KEY") or ""
if not key:
raise RuntimeError("OPENAI_API_KEY is not set.")
self.base_url = base_url
self._headers = _make_openai_headers(key)
self._session: aiohttp.ClientSession | None = None
async def __aenter__(self) -> "OpenAITransport":
timeout = aiohttp.ClientTimeout(total=60)
self._session = aiohttp.ClientSession(headers=self._headers, timeout=timeout)
return self
async def __aexit__(self, exc_type, exc, tb) -> None:
if self._session is not None and not self._session.closed:
await self._session.close()
@property
def session(self) -> aiohttp.ClientSession:
if self._session is None or self._session.closed:
raise RuntimeError("OpenAITransport is not open.")
return self._session
async def get_json(
self, path: str, *, params: dict[str, str] | None = None
) -> dict[str, Any]:
url = f"{self.base_url.rstrip('/')}/{path.lstrip('/')}"
async with self.session.get(url, params=params) as resp:
await _raise_for_status(resp)
return await resp.json(content_type=None)
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# List stored chat completions for the past x days
def _copy_params(params: ListChatCompletionsParams) -> ListChatCompletionsParams:
"""
Typed copy of a TypedDict.
dict(params) becomes dict[str, object] to Pylance, so we explicitly rebuild it.
"""
out: ListChatCompletionsParams = {}
if "model" in params:
out["model"] = params["model"]
if "metadata" in params:
out["metadata"] = params["metadata"]
if "after" in params:
out["after"] = params["after"]
if "limit" in params:
out["limit"] = params["limit"]
if "order" in params:
out["order"] = params["order"]
return out
async def list_recent_stored_chat_completions(
openai: OpenAITransport,
*,
days: int = 30,
params: ListChatCompletionsParams,
) -> ChatCompletionList:
now = int(time.time())
cutoff = now - days * 24 * 60 * 60
# If caller didn't specify an order, use desc for efficient "recent first" paging.
effective_params: ListChatCompletionsParams = _copy_params(params)
if effective_params.get("order") is None:
effective_params["order"] = "desc"
collected: list[ChatCompletion] = []
after: str | None = effective_params.get("after")
while True:
effective_params["after"] = after
qp = _flatten_list_params(effective_params)
payload = await openai.get_json("/chat/completions", params=qp)
data_any = payload.get("data", [])
if not isinstance(data_any, list) or not data_any:
return {
"object": "list",
"data": collected,
"first_id": str(payload.get("first_id", "")),
"last_id": str(payload.get("last_id", "")),
"has_more": False,
}
for item_any in data_any:
if not isinstance(item_any, dict):
continue
# Narrow item_any to a dict[str, Any] for cleaner .get typing
item: dict[str, Any] = item_any
created = item.get("created")
if isinstance(created, int) and created < cutoff:
return {
"object": "list",
"data": collected,
"first_id": (
str(collected[0].get("id", ""))
if collected
else str(payload.get("first_id", ""))
),
"last_id": (
str(collected[-1].get("id", ""))
if collected
else str(payload.get("last_id", ""))
),
"has_more": False,
}
# Tell the type checker this dict structurally matches ChatCompletion.
# This does not change runtime behavior.
collected_item: ChatCompletion = item # type: ignore[assignment]
collected.append(collected_item)
has_more = bool(payload.get("has_more", False))
if not has_more:
return {
"object": "list",
"data": collected,
"first_id": (
str(collected[0].get("id", ""))
if collected
else str(payload.get("first_id", ""))
),
"last_id": (
str(collected[-1].get("id", ""))
if collected
else str(payload.get("last_id", ""))
),
"has_more": False,
}
after_val = payload.get("last_id")
after = after_val if isinstance(after_val, str) and after_val else None
if after is None:
return {
"object": "list",
"data": collected,
"first_id": (
str(collected[0].get("id", ""))
if collected
else str(payload.get("first_id", ""))
),
"last_id": (
str(collected[-1].get("id", ""))
if collected
else str(payload.get("last_id", ""))
),
"has_more": False,
}
async def main() -> None:
params: ListChatCompletionsParams = {
"model": None, # filter
"metadata": None, # filter
"after": None,
"limit": 10, # doesn't work
"order": "desc", # works ["asc" | "desc"]
}
days = 30
async with OpenAITransport() as openai:
result = await list_recent_stored_chat_completions(
openai, days=days, params=params
)
for item in result.get("data", []):
if not isinstance(item, dict):
continue
cc_id = item.get("id")
request_id = item.get("request_id")
created = item.get("created")
usage = item.get("usage") or {}
total_tokens = usage.get("total_tokens")
model = item.get("model")
if isinstance(created, int):
dt = datetime.fromtimestamp(created)
human = dt.strftime("%Y-%m-%d %H:%M:%S")
else:
human = "unknown-time"
tok = total_tokens if isinstance(total_tokens, int) else "?"
print(f"{human} {request_id}\n{cc_id} -{model}, {tok} tokens\n", "-" * 20)
if __name__ == "__main__":
asyncio.run(main())
printing:
2026-01-15 03:17:18 req_ccfa18cf0d4d7d9af25bb3402d4ac5f8
chatcmpl-CyWdR4GCWUFdmPWT85fpwP -gpt-4.1-nano-2025-04-14, 15 tokens
--------------------
2026-01-15 03:10:44 ...