Estimating GPT API Conversation Costs: Factoring in Cumulative Input and Output Tokens

Hello everyone,

Recently, I was tasked with estimating the average cost of a conversation using GPT API’s. The challenge was to factor in both input and output tokens, considering that each subsequent reply includes the entirety of the previous conversation as input.

The function I’ve devised does the following:

  1. For each new reply, it assumes that the chatbot takes into account all preceding messages (from both the user and the chatbot itself) as its input.
  2. It then calculates the cost for both input and output tokens for each interaction.
  3. Additionally, it checks if the token count for any interaction exceeds a specified limit, and raises a warning if it does.

Since I haven’t come across a similar solution elsewhere, I wanted to reach out to the community.

Can anyone confirm if this approach seems sound? If it proves to be useful, I hope it serves as a reference for others in the future. Your feedback is much appreciated!

def chatbot_cost(num_replies, avg_user_reply, avg_bot_reply, input_cost, output_cost, token_limit):
    total_cost = 0
    total_tokens = 0
    
    # Lists to keep track of the lengths of all replies
    user_replies = [avg_user_reply] * num_replies
    bot_replies = [avg_bot_reply] * num_replies

    for i in range(num_replies):
        # Calculate total tokens for this interaction
        input_tokens = sum(user_replies[:i+1]) + sum(bot_replies[:i])
        output_tokens = avg_bot_reply

        if input_tokens + output_tokens > token_limit:
            raise Warning("Token limit exceeded!")

        # Calculate cost for this interaction
        interaction_cost = (input_tokens * input_cost) + (output_tokens * output_cost)
        total_cost += interaction_cost

        # Update total tokens
        total_tokens += input_tokens + output_tokens

    return total_cost

Needless to say, the function was written by gpt4

I have a class that will extend the chat endpoint message itself with a value of the token count, whether you pass one role message or the entire list of them you send to the AI API. Thus, you get the actual count in the message dictionary key “tokens”, along with “role”, “name”, and “content”. If the individual message is included in what you send, token count increase is included.

This allows storage in chat history with a per-message token count, so one can calculate exactly what can be passed of chat history into the remaining context.

I also use a send method that strips metadata out though (but you could just use the return[0][‘tokens’] as your count). Wouldn’t be too much more to extend the class with a method to take all messages in the list for a total.

Thank you for your response!

Your approach is valuable for calculating the tokens based on actual usage, and I’ll definitely consider it for tracking tokens in real-time conversations.

However, I might not have been explicit enough in my initial post. My primary aim is to predict the cost for hypothetical conversations that haven’t occurred yet, rather than analyzing already-existing data.