Usage Info in API Responses

dschnurr · June 21, 2022, 4:55pm

Hi everyone,

We have started providing token usage information as part of the responses from the completions, edits, and embeddings endpoints. This data is the same as what is shown on your usage dashboard, now made available through the API.

For example, a response from the completions endpoint now looks like:

{
 "id": "cmpl-uqkvlQyYK7bGYrRHQ0eXlWi8",
 "object": "text_completion",
 "created": 1589478378,
 "model": "text-davinci-002",
 "choices": [
  {
   "text": "\n\nThis is a test",
   "index": 0,
   "logprobs": null,
   "finish_reason": "length"
  }
 ],
 "usage": {
  "prompt_tokens": 5,
  "completion_tokens": 5,
  "total_tokens": 10
 }
}

You can find full details in the API Reference.

Note that for the completions endpoint, if the stream argument is enabled, the response stream remains unchanged and the usage information is not included.

lachlan · December 13, 2022, 2:51pm

Thanks, this is useful.

How do we calculate the exact total tokens used in streaming requests?

Thanks

hallacy · December 13, 2022, 5:19pm

The feature wasn’t enabled in streaming by default because we found that it could breaking existing integrations. It does exist though! If you would like it turned on, send us a message at help.openai.com

letschers · January 5, 2023, 1:21am

Is it possible to put on every choice response how much tokens were used?

The scenario that I’m facing right now is that I want to make my request with an “n” of more than 1 and I need to catalog how much every completion cost. I could make completion_tokens / n, but it would not be accurate

matevz · February 7, 2023, 10:18am

Please advise on how to enable usage data in completions for streaming mode.

raymonddavey · February 7, 2023, 5:15pm

You cant - and it is very annoying. We have to recreate the tokenizer and calculate the answer ourselves form the text that was returned

We are having accuracy issues though. The tokenizer and what you are billed do not always match

hallacy · February 7, 2023, 5:18pm

If you would like usage data enabled in streaming, please send a message to our support team at help.openai.com and we can enable the feature.

raymonddavey · February 7, 2023, 5:28pm

Hi Chris,

Is it possible to set up a channel that high volume (paying) users and SAAS providers can use to get support from OpenAI staff.

I understand you are super busy right now, but when we need to increase our rate limits or monthly account limits as we roll out a SAAS solution using OpenAI technology, it would be good to have a person or a channel that is not overloaded and that we can get timely responses from.

Like many other, I have asked about rate limit increases and had no reply. I often see messages on the forum about people asking to increase their monthly spend, and not getting a response either.

We are rolling out a product and can’t take on 1000 clients a week because we can’t be sure that the service will handle the requests we will need to send. So, for now, we are throttling our onboarding rate. It would be great if we could confidently “turn on the tap”

Maybe you have a support channel for people that spend over $x per week or month. Maybe you can automatically put people into this channel when they hit the limit so they can get priority support over the millions that are playing with the AI. This way you could support serious SAAS providers and high-volume users.

nick.yang · February 11, 2023, 4:01am

Hi Raymond,

I think alternatively you could check OpenAI services on Azure but seems much costly.

raymonddavey · February 11, 2023, 4:08am

I think they have the wrong price for tuned Davinci models. $34 per hour (Approx $24,000 per month)

I suspect this should be 0.34 per hour (Approx $244 per month)

They also have a Fine Tuned Codex

The don’t mention the versions for the base models either. I assume 003 for davinci - but the examples refer to 002

It looks like they expect you to fire up an instance, run it for a few hours and then shut it down.

Quoting their site:

“You now fine-tune a Curie model with your data, deploy the model and make 14.5M tokens over a 5-day period. You leave the model deployed for the full five days (120 hours) before you delete the endpoint. Here are the charges you will have incurred:”

davidhall · February 27, 2023, 2:50pm

Is this something you enable on a per-account level or on a per-API-token level? Is there a way to have existing integrations using streaming responses without usage info and then switch in a controlled way?

info50 · March 13, 2023, 8:49am

Hi @hallacy , we have opened a ticket in help.openai.com 2 weeks ago to enable the data usage in stream mode for text and chat completion. Nobody has answer on that. Could you please help?

mohit1 · March 29, 2023, 11:33am

Hello, @hallacy I want data usage information,i am using streaming, so how can I get data usage information?

markm · April 8, 2023, 6:19pm

Hi Chris! I’ve messaged you through help.openai.com “Feature Request” to have this feature enabled. Can I also have this enabled when streaming please?

fabioadrianhermann · April 10, 2023, 3:27am

Solo vengo a decir que es un desastre el chat. no se puede cambiar a español y no entiendo nada ni donde estoy escribiendo. siempre al comienzo puedes seleccionar el lenguaje pero aquí en donde hay “inteligencia” no se puede

Slightwind · April 13, 2023, 1:18pm

Hi @dschnurr !
I noticed that on the usage page, i can see the number of requests and token usage per period, so is there any official API that can query the token usage of this conversation through “id”? “id” exists in both stream requests and normal requests. (“id”: “chatcmpl-74pW6*********************Wdi”)
thanks

zuhashaik12 · September 18, 2023, 8:03pm

can we see or fetch the data, that we have generated using our api key, input and output prompts costs usgaes…etc…?

kennywong.kh · September 26, 2023, 10:01pm

Hello - I am also going through similar process. Would you be able to share what the usage info response looks like in Streaming mode?

Foxalabs · September 27, 2023, 10:10am

Hi and welcome to the Developer Forum!

There are no usage info messages returned in streaming mode, you would need to concatenate all of your returned messages deltas and then use tiktoken to count the tokens used.

_j · September 27, 2023, 10:30am

And thus the usage looks like you want it to when you do your own counting:

Well now, top o’ the mornin’ to ya! I’m Mac o’Paddy, the jolliest leprechaun
ye’ll ever meet. I’m a wee bit mischievous, but always with a heart full o’
gold. I’ve been wanderin’ these green hills of Ireland for centuries, guardin’
me pot o’ gold at the end of the rainbow. So, what brings ye to me humble abode today?
> [Finish reason: stop] 60 words/95 chunks, 95 tokens in 3.8 seconds.
##>Can you give the same introduction, but in Mandarin Chinese for my friend?
Ah, sure and begorrah! I’ll give it a go for your friend. In Mandarin
Chinese, it would go a little somethin’ like this:

早上好！我是麦克·奥帕迪，你会遇到的最快乐的小矮人。我有点淘气，但心里总是装满了金子。我在爱尔兰的这片绿色山丘上漫游了几个世纪，守护着我藏在彩虹尽头的金罐。那么，今天你和你的朋友来我这里有什么事呢？
> [Finish reason: stop] 24 words/133 chunks, 167 tokens in 6.7 seconds.
##>

Class I wrote for your use.

import re
import tiktoken  #  pip install tiktoken first
class Tokenizer:
    """ required: import tiktoken; import re; 
usage:
    tokenz = Tokenizer("cl100k_base")
    token_count = tokenz.count(my_string)
    print(f"The phrase {my_string} has a length {token_count}")
    """
    def __init__(self, model_name):
        self.tokenizer = tiktoken.get_encoding(model_name)
        self.chat_strip_match = re.compile(r'<\|.*?\|>')

    def ucount(self, text):
        encoded_text = self.tokenizer.encode(text)
        return len(encoded_text)

    def count(self, text):
        text = self.chat_strip_match.sub('', text)
        encoded_text = self.tokenizer.encode(text)
        return len(encoded_text)

Since there’s some special text stripped when sent to a chat endpoint, the normal count method will strip that too. Neither method counts as a single token the special control tokens that can be forced to be output.

Topic		Replies	Views
OpenAi API - get usage tokens in response when set stream=True API	31	37235	August 3, 2024
Chat completion "stream" API token usage API api	3	6259	May 6, 2024
Why there is no USAGE object returned with Streaming Api Call? API api , chat-completion , completions	20	5066	February 20, 2025
How to get token usage for each API call in streaming model? API	9	8317	December 14, 2023
How to calculate token usage using stream=True? API chatgpt , api	13	3596	December 28, 2023

Usage Info in API Responses

Related topics