I have some questions regarding how billing is done with GPT-4-Turbo using the Chat.Completion

I built a bot that has a system prompt of about 2000 tokens and some functions that have 450 tokens. Will every call I make to the API using Chat.Completion charge for all these tokens? If the conversation progresses a lot and I need to pass a larger conversation history for ChatGPT to have the context to respond, does it charge for the tokens cumulatively? I’m noticing that in a conversation that evolves and totals 2500 tokens, each call is charging the cumulative cost of the prompt + history + new messages. Is my understanding correct? Is there any way to reduce this cost?

Welcome to the forum, Felice!

Chat models are stateless, so each time you do request a completion, you send all of the instructions and the context you want AI to follow (generally it is system prompt and some form of chat history).

1 Like

So, will it always charge for the entire history, system prompt, and user prompt?

It charges for everything you send it and everything you get as a result.

…and therefore, you would want to employ a token-counting method in your local database of conversation history between AI and user, and use intelligent decision-making in your software.

You then can not just prevent errors of going over the context length, but also can prevent $1 per question bills by limiting the memory of a conversation to a particular number of turns or maximum input tokens.


I can also recommend using the gpt-3.5-turbo model, it is 10 times cheaper and in some solutions it does even better)


Thanks, i`ll try with gpt-3.5 too.

Thanks!! this will help me improve my solution!