OpenAi API - get usage tokens in response when set stream=True

_j · August 17, 2025, 12:17pm

Capturing the streaming token usage of Responses is quite easy, not needing more than a passing understanding of events, not needing off-site tutorials, not needing a specific request parameter for it.

Streaming in Responses is sending you named events in the SSE subscription.

The final event you’ll receive is response.completed

It looks very like the output object of non-streaming, and in fact, repeats what has been streamed as content in deltas. RESTful:

event: response.completed
data: {"type":"response.completed","sequence_number":42,"response":{"id":"resp_348921","object":"response","created_at":1755446846,"status":"completed","background":false,"error":null,"incomplete_details":null,"instructions":null,"max_output_tokens":1500,"max_tool_calls":null,"model":"gpt-4.1-nano-2025-04-14","output":[{"id":"msg_902134","type":"message","status":"completed","content":[{"type":"output_text","annotations":[],"logprobs":[],"text":"Yarr, treasure ahead!"}],"role":"assistant"}],"parallel_tool_calls":true,"previous_response_id":null,"prompt_cache_key":null,"reasoning":{"effort":null,"summary":null},"safety_identifier":null,"service_tier":"default","store":true,"temperature":1.0,"text":{"format":{"type":"text"}},"tool_choice":"auto","tools":[],"top_logprobs":0,"top_p":1.0,"truncation":"disabled","usage":{"input_tokens":42,"input_tokens_details":{"cached_tokens":0},"output_tokens":7,"output_tokens_details":{"reasoning_tokens":0},"total_tokens":49},"user":null,"metadata":{}}}

Let’s talk Python, then:

data = r"""{"type":"response.completed","sequence_number":42,"response":{"id":"resp_348921","object":"response","created_at":1755446846,"status":"completed","background":false,"error":null,"incomplete_details":null,"instructions":null,"max_output_tokens":1500,"max_tool_calls":null,"model":"gpt-4.1-nano-2025-04-14","output":[{"id":"msg_902134","type":"message","status":"completed","content":[{"type":"output_text","annotations":[],"logprobs":[],"text":"Yarr, treasure ahead!"}],"role":"assistant"}],"parallel_tool_calls":true,"previous_response_id":null,"prompt_cache_key":null,"reasoning":{"effort":null,"summary":null},"safety_identifier":null,"service_tier":"default","store":true,"temperature":1.0,"text":{"format":{"type":"text"}},"tool_choice":"auto","tools":[],"top_logprobs":0,"top_p":1.0,"truncation":"disabled","usage":{"input_tokens":42,"input_tokens_details":{"cached_tokens":0},"output_tokens":7,"output_tokens_details":{"reasoning_tokens":0},"total_tokens":49},"user":null,"metadata":{}}}"""
payload = json.loads(data)
usage = payload["response"]["usage"]
print(usage["input_tokens"])

Voila - the answer is 42

With one of OpenAI’s API SDK library modules, you’ll be iterating over the generated events and just receive the data itself in your language’s native format, or rather, in OpenAI’s Pydantic-based class objects with attribute methods.

Capturing, parsing, and taking appropriate recursive action for 25 other event types is the challenge on this endpoint.

Topic		Replies	Views
Can the token usage for input prompts be output in the first chunk of the stream response? API api , streaming	2	333	September 5, 2025
Token usage calculation with streaming responses - is this not supported? Feedback	1	498	June 25, 2025
Why there is no USAGE object returned with Streaming Api Call? API api , chat-completion , completions	20	6138	February 20, 2025
Issue with Token Usage in Streaming Responses Bugs api	17	1706	February 21, 2025
Streaming completion in Python API	11	24072	December 13, 2023

OpenAi API - get usage tokens in response when set stream=True

Related topics