We have started providing token usage information as part of the responses from the completions, edits, and embeddings endpoints. This data is the same as what is shown on your usage dashboard, now made available through the API.
For example, a response from the completions endpoint now looks like:
Note that for the completions endpoint, if the stream argument is enabled, the response stream remains unchanged and the usage information is not included.
The feature wasn’t enabled in streaming by default because we found that it could breaking existing integrations. It does exist though! If you would like it turned on, send us a message at help.openai.com
Is it possible to put on every choice response how much tokens were used?
The scenario that I’m facing right now is that I want to make my request with an “n” of more than 1 and I need to catalog how much every completion cost. I could make completion_tokens / n, but it would not be accurate
Is it possible to set up a channel that high volume (paying) users and SAAS providers can use to get support from OpenAI staff.
I understand you are super busy right now, but when we need to increase our rate limits or monthly account limits as we roll out a SAAS solution using OpenAI technology, it would be good to have a person or a channel that is not overloaded and that we can get timely responses from.
Like many other, I have asked about rate limit increases and had no reply. I often see messages on the forum about people asking to increase their monthly spend, and not getting a response either.
We are rolling out a product and can’t take on 1000 clients a week because we can’t be sure that the service will handle the requests we will need to send. So, for now, we are throttling our onboarding rate. It would be great if we could confidently “turn on the tap”
Maybe you have a support channel for people that spend over $x per week or month. Maybe you can automatically put people into this channel when they hit the limit so they can get priority support over the millions that are playing with the AI. This way you could support serious SAAS providers and high-volume users.
I think they have the wrong price for tuned Davinci models. $34 per hour (Approx $24,000 per month)
I suspect this should be 0.34 per hour (Approx $244 per month)
They also have a Fine Tuned Codex
The don’t mention the versions for the base models either. I assume 003 for davinci - but the examples refer to 002
It looks like they expect you to fire up an instance, run it for a few hours and then shut it down.
Quoting their site:
“You now fine-tune a Curie model with your data, deploy the model and make 14.5M tokens over a 5-day period. You leave the model deployed for the full five days (120 hours) before you delete the endpoint. Here are the charges you will have incurred:”
Is this something you enable on a per-account level or on a per-API-token level? Is there a way to have existing integrations using streaming responses without usage info and then switch in a controlled way?
Hi @hallacy , we have opened a ticket in help.openai.com 2 weeks ago to enable the data usage in stream mode for text and chat completion. Nobody has answer on that. Could you please help?
Hi Chris! I’ve messaged you through help.openai.com “Feature Request” to have this feature enabled. Can I also have this enabled when streaming please?
Solo vengo a decir que es un desastre el chat. no se puede cambiar a español y no entiendo nada ni donde estoy escribiendo. siempre al comienzo puedes seleccionar el lenguaje pero aquí en donde hay “inteligencia” no se puede
Hi @dschnurr !
I noticed that on the usage page, i can see the number of requests and token usage per period, so is there any official API that can query the token usage of this conversation through “id”? “id” exists in both stream requests and normal requests. (“id”: “chatcmpl-74pW6*********************Wdi”)
thanks
There are no usage info messages returned in streaming mode, you would need to concatenate all of your returned messages deltas and then use tiktoken to count the tokens used.
And thus the usage looks like you want it to when you do your own counting:
Well now, top o’ the mornin’ to ya! I’m Mac o’Paddy, the jolliest leprechaun
ye’ll ever meet. I’m a wee bit mischievous, but always with a heart full o’
gold. I’ve been wanderin’ these green hills of Ireland for centuries, guardin’
me pot o’ gold at the end of the rainbow. So, what brings ye to me humble abode today? > [Finish reason: stop] 60 words/95 chunks, 95 tokens in 3.8 seconds.
##>Can you give the same introduction, but in Mandarin Chinese for my friend?
Ah, sure and begorrah! I’ll give it a go for your friend. In Mandarin
Chinese, it would go a little somethin’ like this:
Since there’s some special text stripped when sent to a chat endpoint, the normal count method will strip that too. Neither method counts as a single token the special control tokens that can be forced to be output.