GPT-4 pricing for chat API

For the GPT-4 model, There is this code that we can count the number of tokens ( openai-cookbook/examples/How_to_count_tokens_with_tiktoken.ipynb at main · openai/openai-cookbook · GitHub), I use section 6 in that notebook to count the number of tokens for my scenario but then how do we calculate the pricing given that the price for the input tokens and the generated tokens are different in GPT-4. Could you please help me how I can calculate it?

1 Like

In my chart, cost is per 1 million tokens - so divide these prices by 1 million to get the price per token. Then multiply the input tokens by the input price and the output tokens by the output price.

Model Training 1M Input usage 1M Output usage 1M Context Length
GPT-3.5-turbo-0125 $n/a $0.50 $1.50 16k (4k out)
GPT-3.5-turbo-1106 $n/a $1.00 $2.00 16k (4k out)
GPT-3.5-turbo-0613 $n/a $1.50 $2.00 4k
GPT-3.5-turbo-0301 $n/a $1.50 $2.00 4k
gpt-3.5-turbo-16k-0613 $n/a $3.00 $4.00 16k
GPT-3.5 Turbo fine-tune (all?) $8.00 $3.00 $6.00 4k
GPT-4-turbo (all) $n/a $10.00 $30.00 125k (4k out)
GPT-4 $n/a $30.00 $60.00 8k

GPT-4-turbo is $0.01 per 1000 input, and $0.03 per 1000 output. Tokens used are a bit higher than the word count. A typical exchange that could be 700 words of past chat and a question and 250 words of a response thus would be around $0.02.


Thank you. The chart is also available on OpenAI but the main question is how to count the number of input tokens and the output tokens separately given that the code in the link just outputs a number as the total tokens.

As you are the one sending the language to the AI, you are able to measure what you will send. (with an overhead of 4 tokens for each message they are placed within.)

The response length back from the AI can be measured directly (unless the AI emitted a function call, which is also straightforward but tricky).

A chat completions API call that doesn’t use streaming (a response sent a word at a time) also includes a usage object that gives the prompt and response token count.

Tokenization is an important part of making intelligent software to interact with the artificial intelligence. You can manage the budget of how much input you accept, how much past chat you send and how much documentation can be attached.

I tested my application and saved the entire conversation in a file and I measured the number of tokens based on the code in the link. So for this file, I am not sure how to calculate the pricing. The number of tokens is not the issue. there are several ways as you also suggested

Calculating the price is (number_of_tokens) * (price per token)

The price is different from input to the AI and output the AI produces.

The input is also formatted into messages that have overhead.

After you have saved a “whole conversation”, it is no longer in a format where the tokens of the whole conversation can be counted accurately, only estimated. There’s no more division between what was the input and what was the output.

Additionally, each turn that built up a conversation was its own API call that had its own growing token count as the chat got longer.

Nevertheless, lets write some Python code that will calculate whatever you want. You can run it in a Jupyter notebook or a local Python as a .py script. You could even safe the class I show as a rudimentary utility to import (it would not tolerate all types of messages)…

import re
import tiktoken

class Tokenizer:
    """ required: import tiktoken; import re;
    usage example:
        cl100 = Tokenizer()
        number_of_tokens = cl100.count("my string")
    def __init__(self, model="cl100k_base"):
        self.tokenizer = tiktoken.get_encoding(model)
        self.chat_strip_match = re.compile(r'<\|.*?\|>')
        self.intype = None
        self.inprice = 0.01/1000  ### hardcoded GPT-4-Turbo prices
        self.outprice = 0.03/1000

    def ucount(self, text):
        encoded_text = self.tokenizer.encode(text)
        return len(encoded_text)

    def count(self, text):
        text = self.chat_strip_match.sub('', text)
        encoded_text = self.tokenizer.encode(text)
        return len(encoded_text)
    def outputprice(self, text):
        return self.ucount(text) * self.outprice

    def inputprice(self, text):
        return self.ucount(text) * self.inprice

    def message(self, message):
    Extends the input message dictionary or list of dictionaries with a 'tokens' field,
    which contains the token count of the 'role' and 'content' fields
    (and optionally the 'name' field). The token count is calculated using the
    'scount' method, which strips out any text enclosed within "<|" and "|>" before counting the tokens.

        message (dict or list): A dictionary or a list of dictionaries. The ChatML format.
        Each dictionary must have a 'role' field and a 'content' field, and may optionally
        have a 'name' field. The 'role' and 'content' fields are strings, and the
        'name' field, if present, is also a string.

        The input message dictionary or list of dictionaries, extended with a 'tokens' field
        in each dictionary. The 'tokens' field contains the token count of the 'role' and
        'content' fields (and optionally the 'name' field), calculated using the 'scount'
        method. The total token count also includes a fixed overhead of 3 control tokens.

        KeyError: If a dictionary does not have a 'role' or 'content' field.
        if isinstance(message, str):
            self.intype = string
            message = dict(message)
        if isinstance(message, dict):
            self.intype = dict
            message = [message]
        elif isinstance(message, list):
            self.intype = list
            raise ValueError("no supported format in message")
        for msg in message:
            role_string = msg['role']
            if 'name' in msg:
                role_string += ':' + msg['name']
            role_tokens = self.count(role_string)
            content_tokens = self.count(msg['content'])
            msg['tokens'] = 3 + role_tokens + content_tokens
            msg['price'] = msg['tokens'] * self.inprice
        return message if len(message) > 1 else message[0]

The class has several methods, has the price of gpt-4-turbo coded in, and has tools for calculating the individual messages of a chat and even adding the token count and the price to messages as metadata (as messages are inherently input).

You don’t have to understand it, but I can show how to use it:

We’ll create an instance of the class, and also set some typical messages:

token = Tokenizer()

system = [{"role":"system", "content": "this programs the AI to do stuff"}]
user = [{"role":"user", "content": "I'm a user's message. Hi!"}]

Now lets show some of the different stuff the methods of the class perform:

print(token.count(user[0]['content']))  # the count() method just measures tokens
print(token.message(system))  # the message() method gets input dict counts
# some new methods for prices of raw text
print(f" input text price: ${token.inputprice(user[0]['content']):5f}")
print(f"output text price: ${token.outputprice(user[0]['content']):5f}")

{‘role’: ‘system’, ‘content’: ‘this programs the AI to do stuff’, ‘tokens’: 11, ‘price’: 0.00011}
[{‘role’: ‘system’, ‘content’: ‘this programs the AI to do stuff’, ‘tokens’: 11, ‘price’: 0.00011}, {‘role’: ‘user’, ‘content’: “I’m a user’s message. Hi!”, ‘tokens’: 13, ‘price’: 0.00013}]
input text price: $0.000090
output text price: $0.000270

What we asked for: A raw token count. Metadata added to one message. Metadata added to multiple messages. The price of raw tokens in. The price of raw tokens if a response.

Now you want to estimate your chat file. It better be plain text.

with open('myfile.txt', 'r') as file:
    content =

# Print the results
print(f"This file has {token.count(content)} tokens.")
print(f"If it was sent as input, it would cost ${token.inputprice(content):.5f}")
print(f"If it was received as output, it would cost ${token.outputprice(content):.5f}")

my text file:

This file has 608 tokens.
If it was sent as input, it would cost $0.00608
If it was received as output, it would cost $0.01824

There’s tokenizers on the web where you can paste long lengths of text and get token counts.

(the text in the file, if you want to see 608 tokens)

<div class="flex flex-col w-full flex-grow relative border border-black/10 gizmo:border-black/20 gizmo:dark:border-white/30 dark:border-gray-900/50 dark:text-white rounded-xl gizmo:rounded-2xl shadow-xs dark:shadow-xs dark:bg-gray-700 bg-white gizmo:dark:bg-gray-800 gizmo:shadow-[0_0_0_2px_rgba(255,255,255,0.95)] gizmo:dark:shadow-[0_0_0_2px_rgba(52,53,65,0.95)]"><textarea id="prompt-textarea" tabindex="0" data-id="request-:Rqpdm:-0" style="max-height: 200px; height: 56px; overflow-y: hidden;" rows="1" placeholder="Send a message" class="m-0 w-full resize-none border-0 bg-transparent py-[10px] pr-10 focus:ring-0 focus-visible:ring-0 dark:bg-transparent md:py-4 md:pr-12 gizmo:md:py-3.5 gizmo:placeholder-black/50 gizmo:dark:placeholder-white/50 pl-3 md:pl-4"></textarea><button disabled="" class="absolute p-1 rounded-md md:bottom-3 gizmo:md:bottom-2.5 md:p-2 md:right-3 dark:hover:bg-gray-900 dark:disabled:hover:bg-transparent right-2 gizmo:dark:disabled:bg-white gizmo:disabled:bg-black gizmo:disabled:opacity-10 disabled:text-gray-400 enabled:bg-brand-purple gizmo:enabled:bg-black text-white gizmo:p-0.5 gizmo:border gizmo:border-black gizmo:rounded-lg gizmo:dark:border-white gizmo:dark:bg-white bottom-1.5 transition-colors disabled:opacity-40" data-testid="send-button"><span class="" data-state="closed"><svg xmlns="" viewBox="0 0 16 16" fill="none" class="icon-sm m-1 md:m-0"><path d="M.5 1.163A1 1 0 0 1 1.97.28l12.868 6.837a1 1 0 0 1 0 1.766L1.969 15.72A1 1 0 0 1 .5 14.836V10.33a1 1 0 0 1 .816-.983L8.5 8 1.316 6.653A1 1 0 0 1 .5 5.67V1.163Z" fill="currentColor"></path></svg></span></button></div>

Add’l: the tiktoken library of course must be installed in Python. The first time it runs, it also must download a file from the internet, so the execution environment needs internet access.

Hi I don’t think this code considers a total pricing for a chatbot interaction. Because in a chatbot, for every user query, the entire previous conversation plus the system message is the input to the model and then the model generates based on that. So I am still confused about how to calculate the pricing for one chatbot conversation interaction, given that we have stored the entire conversation in a JSON format which has the role “system”, “assistant” and “user” and their “contents”

The class method message adds or updates metadata to every message passed to it in a list.

The whole point is then you can use the metadata for token to manage how much past conversation you want your chatbot to send. The individual messages may be JSON-like, but they are stored as Python list objects, dictionaries.

This allows you to indeed calculate the cost and size of a list slice or all with your own addition, set a budget, see how many messages of history would fit in that budget (along with the system messsage), give the user a display of what you disable automatically or a slider to set the budget in tokens or input cost. Even UI checkboxes to disable sending past messages.

This sample code calculates simple messages above. More complex structures like vision would need different message parsing, and even retrieval of the image to see its size, to make newer calculations required.

(block the new message keys if sending direct to the API, with dictionary exclude)

I still did not get my answer. If anyone has any experience I would appreciate help. The problem is:
We can store the entire text conversation for a chatbot, how can we calculate the total pricing for that conversation, by counting the number of input tokens and output tokens, given that for every user query, the entire previous conversation plus the system message is the input to the model? I am asking this question because, for the chatbot that I have, I want to measure the pricing per user.

I found my solution. If anyone interested I can explain

1 Like

Are you considering the context as the cumulative combination of the previous turn’s context, user input, and model output? That’s what I’m planning to do.

For every n-th turn, you will have:

Input tokens:

Input of n = System + Context of n + User message of n


Context of n = Context of n-1 + User message of n-1 + Output of n-1

yes For each user message, the function includes all previous messages plus the system message

1 Like