Hi, I need to track token usage for cost calculation purposes. and show it to users on the frontend. When using streaming mode, I don’t see token usage information in the response chunks. Is there a way to:
Get token counts during or after a streaming response completes?
Calculate costs for streaming requests without having to estimate tokens beforehand?
I’ve checked the API documentation but couldn’t find clear guidance on this. Any insights?
If you use Completions API you need this parameter: include_usage
response = client.chat.completions.create(
model='gpt-4.1-mini',
messages=[
{'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
],
temperature=0,
stream=True,
stream_options={"include_usage": True}, # retrieving token usage for stream response
)
There is also a cookbook teaching how to get usage with completions streams here.
If you are using Responses API you just need to take a look at the response.completed event:
stream = client.responses.create(
model="gpt-4.1-mini",
input=[
{
"role": "user",
"content": "How many s's are in the word 'mississippi'? Give me other words with the same amount of s's.",
},
],
stream=True,
)
for event in stream:
print(event.type,event)
if event.type=='response.completed':
print(f"usage: {event.response.usage}")
print(event.response.output_text)
print(event.response.usage)