GPT-4 pricing for chat API

For the GPT-4 model, There is this code that we can count the number of tokens ( openai-cookbook/examples/How_to_count_tokens_with_tiktoken.ipynb at main · openai/openai-cookbook · GitHub), I use section 6 in that notebook to count the number of tokens for my scenario but then how do we calculate the pricing given that the price for the input tokens and the generated tokens are different in GPT-4. Could you please help me how I can calculate it?

In my chart, cost is per 1 million tokens - so divide these prices by 1 million to get the price per token. Then multiply the input tokens by the input price and the output tokens by the output price.

Model Training 1M Input usage 1M Output usage 1M Context Length
GPT-3.5-turbo-0125 $n/a $0.50 $1.50 16k (4k out)
GPT-3.5-turbo-1106 $n/a $1.00 $2.00 16k (4k out)
GPT-3.5-turbo-0613 $n/a $1.50 $2.00 4k
GPT-3.5-turbo-0301 $n/a $1.50 $2.00 4k
gpt-3.5-turbo-16k-0613 $n/a $3.00 $4.00 16k
GPT-3.5 Turbo fine-tune (all?) $8.00 $3.00 $6.00 4k
GPT-4-turbo (all) $n/a $10.00 $30.00 125k (4k out)
GPT-4 $n/a $30.00 $60.00 8k

GPT-4-turbo is $0.01 per 1000 input, and $0.03 per 1000 output. Tokens used are a bit higher than the word count. A typical exchange that could be 700 words of past chat and a question and 250 words of a response thus would be around $0.02.

Thank you. The chart is also available on OpenAI but the main question is how to count the number of input tokens and the output tokens separately given that the code in the link just outputs a number as the total tokens.

As you are the one sending the language to the AI, you are able to measure what you will send. (with an overhead of 4 tokens for each message they are placed within.)

The response length back from the AI can be measured directly (unless the AI emitted a function call, which is also straightforward but tricky).

A chat completions API call that doesn’t use streaming (a response sent a word at a time) also includes a usage object that gives the prompt and response token count.

Tokenization is an important part of making intelligent software to interact with the artificial intelligence. You can manage the budget of how much input you accept, how much past chat you send and how much documentation can be attached.

I tested my application and saved the entire conversation in a file and I measured the number of tokens based on the code in the link. So for this file, I am not sure how to calculate the pricing. The number of tokens is not the issue. there are several ways as you also suggested

Calculating the price is (number_of_tokens) * (price per token)

The price is different from input to the AI and output the AI produces.

The input is also formatted into messages that have overhead.

After you have saved a “whole conversation”, it is no longer in a format where the tokens of the whole conversation can be counted accurately, only estimated. There’s no more division between what was the input and what was the output.

Additionally, each turn that built up a conversation was its own API call that had its own growing token count as the chat got longer.

Nevertheless, lets write some Python code that will calculate whatever you want. You can run it in a Jupyter notebook or a local Python as a .py script. You could even safe the class I show as a rudimentary utility to import (it would not tolerate all types of messages)…

import re
import tiktoken

class Tokenizer:
    """ required: import tiktoken; import re;
    usage example:
        cl100 = Tokenizer()
        number_of_tokens = cl100.count("my string")
    """
    def __init__(self, model="cl100k_base"):
        self.tokenizer = tiktoken.get_encoding(model)
        self.chat_strip_match = re.compile(r'<\|.*?\|>')
        self.intype = None
        self.inprice = 0.01/1000  ### hardcoded GPT-4-Turbo prices
        self.outprice = 0.03/1000

    def ucount(self, text):
        encoded_text = self.tokenizer.encode(text)
        return len(encoded_text)

    def count(self, text):
        text = self.chat_strip_match.sub('', text)
        encoded_text = self.tokenizer.encode(text)
        return len(encoded_text)
    
    def outputprice(self, text):
        return self.ucount(text) * self.outprice

    def inputprice(self, text):
        return self.ucount(text) * self.inprice

    def message(self, message):
        """
    Extends the input message dictionary or list of dictionaries with a 'tokens' field,
    which contains the token count of the 'role' and 'content' fields
    (and optionally the 'name' field). The token count is calculated using the
    'scount' method, which strips out any text enclosed within "<|" and "|>" before counting the tokens.

    Args:
        message (dict or list): A dictionary or a list of dictionaries. The ChatML format.
        Each dictionary must have a 'role' field and a 'content' field, and may optionally
        have a 'name' field. The 'role' and 'content' fields are strings, and the
        'name' field, if present, is also a string.

    Returns:
        The input message dictionary or list of dictionaries, extended with a 'tokens' field
        in each dictionary. The 'tokens' field contains the token count of the 'role' and
        'content' fields (and optionally the 'name' field), calculated using the 'scount'
        method. The total token count also includes a fixed overhead of 3 control tokens.

    Raises:
        KeyError: If a dictionary does not have a 'role' or 'content' field.
    """
        if isinstance(message, str):
            self.intype = string
            message = dict(message)
        if isinstance(message, dict):
            self.intype = dict
            message = [message]
        elif isinstance(message, list):
            self.intype = list
        else:
            raise ValueError("no supported format in message")
        for msg in message:
            role_string = msg['role']
            if 'name' in msg:
                role_string += ':' + msg['name']
            role_tokens = self.count(role_string)
            content_tokens = self.count(msg['content'])
            msg['tokens'] = 3 + role_tokens + content_tokens
            msg['price'] = msg['tokens'] * self.inprice
        return message if len(message) > 1 else message[0]

The class has several methods, has the price of gpt-4-turbo coded in, and has tools for calculating the individual messages of a chat and even adding the token count and the price to messages as metadata (as messages are inherently input).

You don’t have to understand it, but I can show how to use it:

We’ll create an instance of the class, and also set some typical messages:

token = Tokenizer()

system = [{"role":"system", "content": "this programs the AI to do stuff"}]
user = [{"role":"user", "content": "I'm a user's message. Hi!"}]

Now lets show some of the different stuff the methods of the class perform:

print(token.count(user[0]['content']))  # the count() method just measures tokens
print(token.message(system))  # the message() method gets input dict counts
print(token.message(system+user))
# some new methods for prices of raw text
print(f" input text price: ${token.inputprice(user[0]['content']):5f}")
print(f"output text price: ${token.outputprice(user[0]['content']):5f}")

9
{‘role’: ‘system’, ‘content’: ‘this programs the AI to do stuff’, ‘tokens’: 11, ‘price’: 0.00011}
[{‘role’: ‘system’, ‘content’: ‘this programs the AI to do stuff’, ‘tokens’: 11, ‘price’: 0.00011}, {‘role’: ‘user’, ‘content’: “I’m a user’s message. Hi!”, ‘tokens’: 13, ‘price’: 0.00013}]
input text price: $0.000090
output text price: $0.000270

What we asked for: A raw token count. Metadata added to one message. Metadata added to multiple messages. The price of raw tokens in. The price of raw tokens if a response.

Now you want to estimate your chat file. It better be plain text.

with open('myfile.txt', 'r') as file:
    content = file.read()

# Print the results
print(f"This file has {token.count(content)} tokens.")
print(f"If it was sent as input, it would cost ${token.inputprice(content):.5f}")
print(f"If it was received as output, it would cost ${token.outputprice(content):.5f}")

my text file:

This file has 608 tokens.
If it was sent as input, it would cost $0.00608
If it was received as output, it would cost $0.01824

There’s tokenizers on the web where you can paste long lengths of text and get token counts. https://tiktokenizer.vercel.app/

(the text in the file, if you want to see 608 tokens)

<div class="flex flex-col w-full flex-grow relative border border-black/10 gizmo:border-black/20 gizmo:dark:border-white/30 dark:border-gray-900/50 dark:text-white rounded-xl gizmo:rounded-2xl shadow-xs dark:shadow-xs dark:bg-gray-700 bg-white gizmo:dark:bg-gray-800 gizmo:shadow-[0_0_0_2px_rgba(255,255,255,0.95)] gizmo:dark:shadow-[0_0_0_2px_rgba(52,53,65,0.95)]"><textarea id="prompt-textarea" tabindex="0" data-id="request-:Rqpdm:-0" style="max-height: 200px; height: 56px; overflow-y: hidden;" rows="1" placeholder="Send a message" class="m-0 w-full resize-none border-0 bg-transparent py-[10px] pr-10 focus:ring-0 focus-visible:ring-0 dark:bg-transparent md:py-4 md:pr-12 gizmo:md:py-3.5 gizmo:placeholder-black/50 gizmo:dark:placeholder-white/50 pl-3 md:pl-4"></textarea><button disabled="" class="absolute p-1 rounded-md md:bottom-3 gizmo:md:bottom-2.5 md:p-2 md:right-3 dark:hover:bg-gray-900 dark:disabled:hover:bg-transparent right-2 gizmo:dark:disabled:bg-white gizmo:disabled:bg-black gizmo:disabled:opacity-10 disabled:text-gray-400 enabled:bg-brand-purple gizmo:enabled:bg-black text-white gizmo:p-0.5 gizmo:border gizmo:border-black gizmo:rounded-lg gizmo:dark:border-white gizmo:dark:bg-white bottom-1.5 transition-colors disabled:opacity-40" data-testid="send-button"><span class="" data-state="closed"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" fill="none" class="icon-sm m-1 md:m-0"><path d="M.5 1.163A1 1 0 0 1 1.97.28l12.868 6.837a1 1 0 0 1 0 1.766L1.969 15.72A1 1 0 0 1 .5 14.836V10.33a1 1 0 0 1 .816-.983L8.5 8 1.316 6.653A1 1 0 0 1 .5 5.67V1.163Z" fill="currentColor"></path></svg></span></button></div>

Add’l: the tiktoken library of course must be installed in Python. The first time it runs, it also must download a file from the internet, so the execution environment needs internet access.