Usage Info in API Responses

Hi everyone,

We have started providing token usage information as part of the responses from the completions, edits, and embeddings endpoints. This data is the same as what is shown on your usage dashboard, now made available through the API.

For example, a response from the completions endpoint now looks like:

{
 "id": "cmpl-uqkvlQyYK7bGYrRHQ0eXlWi8",
 "object": "text_completion",
 "created": 1589478378,
 "model": "text-davinci-002",
 "choices": [
  {
   "text": "\n\nThis is a test",
   "index": 0,
   "logprobs": null,
   "finish_reason": "length"
  }
 ],
 "usage": {
  "prompt_tokens": 5,
  "completion_tokens": 5,
  "total_tokens": 10
 }
}

You can find full details in the API Reference.

Note that for the completions endpoint, if the stream argument is enabled, the response stream remains unchanged and the usage information is not included.

17 Likes

Thanks, this is useful.

How do we calculate the exact total tokens used in streaming requests?

Thanks

The feature wasn’t enabled in streaming by default because we found that it could breaking existing integrations. It does exist though! If you would like it turned on, send us a message at help.openai.com

1 Like

Is it possible to put on every choice response how much tokens were used?

The scenario that I’m facing right now is that I want to make my request with an “n” of more than 1 and I need to catalog how much every completion cost. I could make completion_tokens / n, but it would not be accurate :smiling_face_with_tear:

1 Like

Please advise on how to enable usage data in completions for streaming mode.

You cant - and it is very annoying. We have to recreate the tokenizer and calculate the answer ourselves form the text that was returned

We are having accuracy issues though. The tokenizer and what you are billed do not always match

If you would like usage data enabled in streaming, please send a message to our support team at help.openai.com and we can enable the feature.

1 Like

Hi Chris,

Is it possible to set up a channel that high volume (paying) users and SAAS providers can use to get support from OpenAI staff.

I understand you are super busy right now, but when we need to increase our rate limits or monthly account limits as we roll out a SAAS solution using OpenAI technology, it would be good to have a person or a channel that is not overloaded and that we can get timely responses from.

Like many other, I have asked about rate limit increases and had no reply. I often see messages on the forum about people asking to increase their monthly spend, and not getting a response either.

We are rolling out a product and can’t take on 1000 clients a week because we can’t be sure that the service will handle the requests we will need to send. So, for now, we are throttling our onboarding rate. It would be great if we could confidently “turn on the tap”

Maybe you have a support channel for people that spend over $x per week or month. Maybe you can automatically put people into this channel when they hit the limit so they can get priority support over the millions that are playing with the AI. This way you could support serious SAAS providers and high-volume users.

2 Likes

Hi Raymond,

I think alternatively you could check OpenAI services on Azure but seems much costly.

1 Like

I think they have the wrong price for tuned Davinci models. $34 per hour (Approx $24,000 per month)

I suspect this should be 0.34 per hour (Approx $244 per month)

They also have a Fine Tuned Codex

The don’t mention the versions for the base models either. I assume 003 for davinci - but the examples refer to 002

It looks like they expect you to fire up an instance, run it for a few hours and then shut it down.

Quoting their site:

“You now fine-tune a Curie model with your data, deploy the model and make 14.5M tokens over a 5-day period. You leave the model deployed for the full five days (120 hours) before you delete the endpoint. Here are the charges you will have incurred:”

2 Likes

Is this something you enable on a per-account level or on a per-API-token level? Is there a way to have existing integrations using streaming responses without usage info and then switch in a controlled way?

Hi @hallacy , we have opened a ticket in help.openai.com 2 weeks ago to enable the data usage in stream mode for text and chat completion. Nobody has answer on that. Could you please help?

1 Like

Hello, @hallacy I want data usage information,i am using streaming, so how can I get data usage information?

Hi Chris! I’ve messaged you through help.openai.com “Feature Request” to have this feature enabled. Can I also have this enabled when streaming please?

Solo vengo a decir que es un desastre el chat. no se puede cambiar a español y no entiendo nada ni donde estoy escribiendo. siempre al comienzo puedes seleccionar el lenguaje pero aquí en donde hay “inteligencia” no se puede

Hi @dschnurr !
I noticed that on the usage page, i can see the number of requests and token usage per period, so is there any official API that can query the token usage of this conversation through “id”? “id” exists in both stream requests and normal requests. (“id”: “chatcmpl-74pW6*********************Wdi”)
thanks

can we see or fetch the data, that we have generated using our api key, input and output prompts costs usgaes…etc…?