I built a bot that has a system prompt of about 2000 tokens and some functions that have 450 tokens. Will every call I make to the API using Chat.Completion charge for all these tokens? If the conversation progresses a lot and I need to pass a larger conversation history for ChatGPT to have the context to respond, does it charge for the tokens cumulatively? I’m noticing that in a conversation that evolves and totals 2500 tokens, each call is charging the cumulative cost of the prompt + history + new messages. Is my understanding correct? Is there any way to reduce this cost?
Welcome to the forum, Felice!
Chat models are stateless, so each time you do request a completion, you send all of the instructions and the context you want AI to follow (generally it is system prompt and some form of chat history).
So, will it always charge for the entire history, system prompt, and user prompt?
It charges for everything you send it and everything you get as a result.
…and therefore, you would want to employ a token-counting method in your local database of conversation history between AI and user, and use intelligent decision-making in your software.
You then can not just prevent errors of going over the context length, but also can prevent $1 per question bills by limiting the memory of a conversation to a particular number of turns or maximum input tokens.
I can also recommend using the gpt-3.5-turbo model, it is 10 times cheaper and in some solutions it does even better)
Thanks, i`ll try with gpt-3.5 too.
Thanks!! this will help me improve my solution!