My pricing metrics make no sense? I seem to be getting charged double everyday?

Based on what I’m seeing

12:35 AM
Local time: Aug 10, 2023, 8:35 PM
gpt-4-0613, 4 requests
5,089 prompt + 244 completion = 5,333 tokens

12:40 AM
Local time: Aug 10, 2023, 8:40 PM
gpt-4-0613, 3 requests
4,588 prompt + 244 completion = 4,832 tokens

How could these numbers be correct? Based on the 12:40AM data, 3 people typed 4500 tokens and got 244 completion tokens?? How does that make sense? Every time in my app when you type say 500 words, you get a 500 word response typically, so i dont understand how this could be so lopsided?

And the price of my usage seems to be doubling,

For instance on august 10th I was charged about 4.5 cents per request ( $22.08 / 481req ) whereas today the 20th, I’ve had 292 requests and ive been charged 25.12, which comes out to ( $25.12 / 292req ) which comes out to 8.6 cents

How has my price per request DOUBLED, and why do none of the metrics ive shown you represent what actually gets shown on the user end? Never have I typed 1000 words into my bot and received and 100 word completion. Something is wrong and i’d like answers.

Hi,

When users are placing context into their prompts, be that history of prior chats or simply data they want processed by the model and they are asking for a specific answer, it is quite easy to enter 3k tokens as a prompt and get only a few hundred and even less as output.

Imagine pasting in a 4000 word news article and asking the model for a brief summary.

1 Like

It sounds like you need to be doing better tracking of what API calls are being made.

1 Like

user: (A tokens)
response: (B tokens)

You’ll get charged A+B tokens. A prompt and B completion.

user: (A tokens)
assistant: (B tokens)
user: (C tokens)
assistant: (D tokens)
user: (E tokens)
response: (F tokens)

You’ll be charged A+B+C+D+E+F. (A+B+C+D+E) is the prompt and F is the completion.

It ramps up very quickly with longer conversations, especially with short responses like “ok”, where the whole previous conversation gets dumped again. This could possibly be why you’re seeing it double.

No, you’ll be charged,

Prompt: A
Response: B
Prompt: A + B + C
Response: D
Prompt: A + B + C + D + E
Response: F

For a total of,
Prompt: 3A + 2B + 2C + D + E
Response: B + D + F

Up until you reach the context length minus the max_tokens value.

But, yes, things can add up very quickly.

1 Like

In what way do you feel like this cleared up anything?

NO NO NO YOULL BE CHARGED LIKE THIS!

NO it doesnt sound like I need to keep better track of my tokens it sounds like openai needs to be more transparent about how their tokenization works, and why im seeing 5,000 tokens in my prompt and 300 in my response. “OH BRO YOU JUST GOTTA KNOW ITS A + B + C + D + E + F” "NO DONT LISTEN TO THAT GUY ITS “A + B = C + D + E = F”

???
like what, im trying to run a business i need clear metrics.

Ill be switching to azure very soon

Good luck with that sport, it’s the identical billing structure.

Also, I just explained why you’re saying what you’re seeing, it’s well documented and you presumably built the system you’re using to send and receive messages through OpenAI’s API, and the API tells you exactly how many tokens each call is, so I’m confused why you’re confused.

But anyway, good luck.

Because i run a chat app. thats all its for. So when i type 500 words, i typically get 500 words back. and so on and so forth. SO the fact that what my billing literally says things like this

12:00 AM

Local time: Aug 20, 2023, 8:00 PM

gpt-4-0613, 1 request

1,489 prompt + 54 completion = 1,543 tokens

Okay so one person typed in like idk… let say 900 words, and got back like 40 words? when does that happen? never. These metrics do not add up. Never in my life have i put in 1500 tokens of a prompt and recieved back 54 tokens worth, never, that has not happened but continue to try and gaslight me

Hi Thomas,

The token usage and the methods to calculate them are well documented, it seems that your userbase is now using the model in ways potentially different to how you envisioned their usage initially. The “a+b” description was pointing out that if you feed the conversation history back into the model, then usage will grow to include each conversation element that gets added as contextual history. This is how all large language models work, if you have employed a conversational style interaction then costings can increase with longer sessions. If you have not done that, i.e., you have implemented single prompt with single response, then your customers are misusing your service and you need to look at ways to prevent that. Perhaps a maximum message length?

You are free to choose whichever service providers you see fit for your venture, and be aware that they all employ the same token pricing methodology.

I suggest you add some text logging to see what the text content is of the longer prompts your userbase is generating so that you can build up a better idea of where the usage issue is coming from, the users on this forum are trying to give you sound advise on how to deal with your application.

It’s great that your app has grown rapidly, but as you know, usage limits and rate limits need to be applied for before being granted and are limited by server capacity and developer usage loads, so no guarantees can be made on when that will happen. The developer forum has no ability to affect the outcome of usage and limit increase application and must be dealt with by the team that looks at the applications.

If you have further questions about how token usage is calculated, please feel free to ask and I wil do my best to answer them.

Here’s an example where more tokens go in than come out - Summary by AI:

User tventura94 is questioning the pricing metrics they’re experiencing with OpenAI’s GPT-4 model. They notice their service’s pricing per user request doubling and the number of prompt tokens substantially outnumbering the completion tokens in their API call records. Several community members respond to tventura94’s concerns. Foxabilo explains that high prompt tokens can occur if users input lots of context for minimal output. Smuzani and elmstedt elaborate on how OpenAI’s token billing works, stressing that both input (prompt) and output (completion) tokens are billed and that repeated conversations can inflate the cost due to the repetition of the previous conversation’s tokens. Tventura94 expresses frustration with the explanations, feeling it does not sufficiently explain the discrepancy they’re observing. Foxabilo further reasons that if customers use the chatbot differently than intended—for instance, inserting long prompts for short responses—cost would increase. All providers use the same token-based cost approach, and they advised tventura94 to analyze their userbase’s usage patterns.

Besides other such cases with large input (entity extraction, keyword identification), there is chat: A chatbot must have a large input of prior conversation each turn in order to carry on a conversation where it appears to have a memory.

If you are just using someone else’s software for your “I’m trying to run a business”, you might not be aware of typical chat usage patterns. The input can be 10-20 turns of prior conversation just to ask the AI “are you sure?”.

Okay thanks guys, sorry for being a brat I understand now.

To hold memory it has to add each prompt to whatever newly typed prompt,

makes sense.

sorry all

1 Like

Is there a way to get around this and give chatgpt a memory?

There are different techniques to manage conversation history. ChatGPT, for example, aggressively minimizes the conversation that is passed each turn to those turns that are contextually relevant and recent, using a vector database.

This is also one of the complaints about ChatGPT - that it quickly forgets what you were talking about.

Value can be added to a third-party product by having a long and robust conversation history, but with GPT-4, that costs significant money, and also a larger input has more compute requirements (which is what you are paying for, and what OpenAI tries to reduce in its own over-taxed product).

The type of conversation history depends on the use. If you are just having a general chat, one only needs to ensure that the topic and the last few answers are given for the illusion of memory. If you are coding and have given the AI your existing code, you may need the original plus revisions to be perfectly preserved and not truncated or summarized. If you are working on composing and revising a document, you may need management that keeps the latest “accepted” version along with proposed changes. Using functions via API, one must have a linear history of all calls until the AI finally answers.

Besides embeddings and vector databases, language AI (such as cheaper gpt-3.5-turbo) can be used to track the conversation flow and mark topical changes so only what is relevant is passed. These are all advanced strategies to be explored that don’t have plug-and-play solutions.

1 Like

I am using gpt-4, you’re saying the third party added memory options will be too expensive with this model?

Does anyone have any resources on what your general numbers need to be to start a profitable chat bot? Because now I’m just wondering why release a commercial API and make it so the developers can’t cover their costs for the usage?
Do they just not want people to make chat bots?

I am going to try and negotiate a wholesale rate with azure.

Commercial terms start to make sense at the 450million tokens per day mark, at that point you will find benefits in terms of price and performance with a dedicated instance, you can reach out to sales@openai.com to enquire more.

1 Like

Well, lets look at your proposed use case.

If you produce a bot where users get 10,000 tokens of chat per month for $5, I think you have a profitable and reasonable business model, you could then offer higher tiers for heaver users, say 50k tokens for $10, and so on, you can build out your tier pricing structure to suit your userbase and go from there. (prices are off the top of my head an may need adjustment)

(post deleted by author)

There’s no “wholesale rate with Azure” I know of unless you are in the “$1 million+ dedicated rack of GPT-4 instance” range (and still unlikely to get OpenAI or Microsoft sales on the phone unless you are a long-term partner). Azure at least doesn’t require months of proven payments simply to make the jump above $500/mo ($15/day).

When I say third-party, I mean you. You are trying to compete against ChatGPT, which gives up to hundreds of GPT-4 responses a day for under a dollar a day, which is only feasible by having casual users that don’t constantly take advantage.

You can charge more if you do something better to add value, either a specialized product or a better product… but consider that “better” by improving GPT-4 chat to have a maximum conversation memory ability can approach $0.25 per single question after a long session.

(Also, I would not base a company on 200 words of prompt anybody can type into ChatGPT.)

Interesting cuz i just got off the phone with a sales rep from azure who is putting me in contact with three different wholesale providers so guess you got that entirely wrong.

“You are trying to compete against ChatGPT” - No, im using an API commercial service offered by OpenAI to make a bot that is niche and better at its specific job than ChatGPT is. im not competing, im drinking from the troth that is given to me and training my own models with a highly sophisticated prompt that according to you “anyone could type into chatgpt” yeah if every person had the knowledge of a psychologist - you’re right they could, but they don’t, so no someone cant just recreate what I have the exact same way.