We have started providing token usage information as part of the responses from the completions, edits, and embeddings endpoints. This data is the same as what is shown on your usage dashboard, now made available through the API.
For example, a response from the completions endpoint now looks like:
The feature wasn’t enabled in streaming by default because we found that it could breaking existing integrations. It does exist though! If you would like it turned on, send us a message at help.openai.com
Is it possible to put on every choice response how much tokens were used?
The scenario that I’m facing right now is that I want to make my request with an “n” of more than 1 and I need to catalog how much every completion cost. I could make completion_tokens / n, but it would not be accurate
Is it possible to set up a channel that high volume (paying) users and SAAS providers can use to get support from OpenAI staff.
I understand you are super busy right now, but when we need to increase our rate limits or monthly account limits as we roll out a SAAS solution using OpenAI technology, it would be good to have a person or a channel that is not overloaded and that we can get timely responses from.
Like many other, I have asked about rate limit increases and had no reply. I often see messages on the forum about people asking to increase their monthly spend, and not getting a response either.
We are rolling out a product and can’t take on 1000 clients a week because we can’t be sure that the service will handle the requests we will need to send. So, for now, we are throttling our onboarding rate. It would be great if we could confidently “turn on the tap”
Maybe you have a support channel for people that spend over $x per week or month. Maybe you can automatically put people into this channel when they hit the limit so they can get priority support over the millions that are playing with the AI. This way you could support serious SAAS providers and high-volume users.
I think they have the wrong price for tuned Davinci models. $34 per hour (Approx $24,000 per month)
I suspect this should be 0.34 per hour (Approx $244 per month)
They also have a Fine Tuned Codex
The don’t mention the versions for the base models either. I assume 003 for davinci - but the examples refer to 002
It looks like they expect you to fire up an instance, run it for a few hours and then shut it down.
Quoting their site:
“You now fine-tune a Curie model with your data, deploy the model and make 14.5M tokens over a 5-day period. You leave the model deployed for the full five days (120 hours) before you delete the endpoint. Here are the charges you will have incurred:”
Is this something you enable on a per-account level or on a per-API-token level? Is there a way to have existing integrations using streaming responses without usage info and then switch in a controlled way?