Capturing the streaming token usage of Responses is quite easy, not needing more than a passing understanding of events, not needing off-site tutorials, not needing a specific request parameter for it.
Streaming in Responses is sending you named events in the SSE subscription.
The final event you’ll receive is response.completed
It looks very like the output object of non-streaming, and in fact, repeats what has been streamed as content in deltas. RESTful:
event: response.completed
data: {"type":"response.completed","sequence_number":42,"response":{"id":"resp_348921","object":"response","created_at":1755446846,"status":"completed","background":false,"error":null,"incomplete_details":null,"instructions":null,"max_output_tokens":1500,"max_tool_calls":null,"model":"gpt-4.1-nano-2025-04-14","output":[{"id":"msg_902134","type":"message","status":"completed","content":[{"type":"output_text","annotations":[],"logprobs":[],"text":"Yarr, treasure ahead!"}],"role":"assistant"}],"parallel_tool_calls":true,"previous_response_id":null,"prompt_cache_key":null,"reasoning":{"effort":null,"summary":null},"safety_identifier":null,"service_tier":"default","store":true,"temperature":1.0,"text":{"format":{"type":"text"}},"tool_choice":"auto","tools":[],"top_logprobs":0,"top_p":1.0,"truncation":"disabled","usage":{"input_tokens":42,"input_tokens_details":{"cached_tokens":0},"output_tokens":7,"output_tokens_details":{"reasoning_tokens":0},"total_tokens":49},"user":null,"metadata":{}}}
Let’s talk Python, then:
data = r"""{"type":"response.completed","sequence_number":42,"response":{"id":"resp_348921","object":"response","created_at":1755446846,"status":"completed","background":false,"error":null,"incomplete_details":null,"instructions":null,"max_output_tokens":1500,"max_tool_calls":null,"model":"gpt-4.1-nano-2025-04-14","output":[{"id":"msg_902134","type":"message","status":"completed","content":[{"type":"output_text","annotations":[],"logprobs":[],"text":"Yarr, treasure ahead!"}],"role":"assistant"}],"parallel_tool_calls":true,"previous_response_id":null,"prompt_cache_key":null,"reasoning":{"effort":null,"summary":null},"safety_identifier":null,"service_tier":"default","store":true,"temperature":1.0,"text":{"format":{"type":"text"}},"tool_choice":"auto","tools":[],"top_logprobs":0,"top_p":1.0,"truncation":"disabled","usage":{"input_tokens":42,"input_tokens_details":{"cached_tokens":0},"output_tokens":7,"output_tokens_details":{"reasoning_tokens":0},"total_tokens":49},"user":null,"metadata":{}}}"""
payload = json.loads(data)
usage = payload["response"]["usage"]
print(usage["input_tokens"])
Voila - the answer is 42
With one of OpenAI’s API SDK library modules, you’ll be iterating over the generated events and just receive the data itself in your language’s native format, or rather, in OpenAI’s Pydantic-based class objects with attribute methods.
Capturing, parsing, and taking appropriate recursive action for 25 other event types is the challenge on this endpoint.