OpenAi API - get usage tokens in response when set stream=True

evardion · December 14, 2023, 9:05pm

The workaround I used was to get Chat GPT to create a paragraph in each language and then use the online tokenizer to calculate the number of tokens for each paragraph then divide by the number of characters in each language paragraph.

That gives me an average character to token ratio for reach language that I can store in a table.

I did this on a per language basis because the average number of characters per token can vary dramatically by language.

I’m then able to guestimate how many tokens would be used for any arbitrary number of characters. It’s not 100% accurate but it is very fast and it’s close enough (90%-ish).

Of course this does require you to know the language of the text up front and it wont work if the text has multiple languages.

In my case I’m only using it to roughly guesstimate how much each conversation costs.

thanh_quach · January 21, 2024, 1:12am

guys, unable to measure tokens usage is a special feature because we then can focus on the code, instead of the bill

energy.core · February 12, 2024, 8:14pm

Yeah, why just not include “usage” section as described with non streamed response in final chunk just before [DONE]

duerr.simon · February 22, 2024, 10:24am

+1, really unfortunate that we are not getting that information.

beamingdan · April 3, 2024, 12:43pm

+1 to this - it’s not clear whether the number of streaming chunks corresponds to number of tokens and I don’t want to estimate how much was used per request, ideally. Please include this in the final chunk!

berna94 · April 10, 2024, 10:17pm

+1 - tiktoken workaround does not work when you include images in the messages. Include usage in the streaming response.

llorella · April 11, 2024, 5:29pm

+1. Has anyone found a solution for this? Is the answer just to calculate token usage offline with tiktoken? Seems pretty counter-intuitive to not include the tokens used for the streaming api.

yhc3141 · April 16, 2024, 2:41am

I finally found a solution I’m happy with, after hours of scouring documentation. Unfortunately, they do not give an option to query for usage information by ID, or even just returning usage somehow; that would’ve been the easier solution. Instead, here’s my implementation. It involves:

Counting tokens for images with the new gpt-4-turbo/vision models
The scuffed and varied additional tokens that get added in with openai’s api
Wrapping the returned Stream generator, appending any tokens to a list before yielding, and finally processing the list as the output message

Implementation of the CountStreamTokens class (types are slightly scuffed):

github.com

flatypus/flowchat/blob/main/flowchat/private/_private_helpers.py

from ..types import *
from io import BytesIO
from math import ceil
from PIL import Image
from PIL.Image import Image as PILImage
from requests import get
from typing import Callable, List, Dict
import base64
import tiktoken


def encode_image(image: PILImage, format_type: str = "PNG"):
    buffered = BytesIO()
    image.save(buffered, format=format_type)
    img_str = base64.b64encode(buffered.getvalue())
    return f"data:image/png;base64,{img_str.decode('utf-8')}"


class CalculateImageTokens:
    def __init__(self, image: str):

This file has been truncated. show original

Code:


def add_token_count(self, prompt_tokens: int, completion_tokens: int, model: str) -> None:
        # I append the tokens to a running total here. This will be called after the calculation is finished, as a callback. 
        # You can choose to do anything here with the numbers.
        self.detailed_usage.append({
            "model": model,
            "usage": {"prompt_tokens": prompt_tokens, "completion_tokens": completion_tokens},
            "time": datetime.now()
        })

completion = openai.chat.completions.create(messages=messages, stream=True, **params)

# completion is now a generator, or a 'stream' object. 
# CountStreamTokens is a custom class that is initialized with the model you use, and the messages you want to query with. 
# These are saved as class attributes for use in the .wrap_stream_and_count() function.
# The .wrap_stream_and_count() returns another generator, yielding all the same tokens as OpenAI provides, 
# but simultaneously collecting the output tokens.
# When the generator detects a None (ending) token in the stream, 
# it yields the final token and begins counting tokens (as to keep the stream running)

return CountStreamTokens(model, messages).wrap_stream_and_count(completion, add_token_count)

andycross · April 20, 2024, 5:46am

Please implement this in the final chunk. I don’t want to estimate something and introduce inaccuracies.

PS. (ahem Claude already tells me the usage in the final chunk of a stream ahem)

Topic		Replies	Views
How to get token usage for each API call in streaming model? API	9	6891	December 14, 2023
Why there is no USAGE object returned with Streaming Api Call? API api , chat-completion , completions	19	3201	February 23, 2024
Usage Info in API Responses Announcements	20	9014	September 27, 2023
Chat completion "stream" API token usage API api	2	4589	December 14, 2023
How to get total_tokens from a stream of CompletionCreateRequests API	6	4508	December 19, 2023

OpenAi API - get usage tokens in response when set stream=True

Related Topics