How to calculate the cost of a specific request made to the web API (and its reply), in tokens?

TL;DR How can I calculate the cost, in tokens, of a specific request made to the OpenAI API?


Hi all. I’ve just used the OpenAI Playground (model: gpt-3.5-turbo) to submit a user message and obtain an assistant message in reply. Is it possible to calculate the actual cost of this, in terms of tokens? If so, how can I do that? The user message, assistant message, and system message are below.

ChatGPT says the user message contains 2 tokens and the assistant message contains 3 tokens. I didn’t include the system message in my question–does it affect tokens used?

Motivation for my question: As a side project I want to build a web app using the OpenAI API. But before I do so, I’d like to estimate the costs I can expect to incur during the project.

Thanks in advance.


System message

You are an expert in American cuisine, and creating different dishes from items found in US grocery stores.

User message

I will give you the name of a common product found in a US grocery store.  I would like you to tell me the dish most commonly consumed in the US that contains this item as its featured ingredient.  If possible, ensure the dish is one that most would consider to be 'American cuisine.'   Please restrict your answer to one item, and include only the name of the dish.

The item is:

Boneless skinless chicken thighs

Assistant message

Barbecue Chicken Thighs

2 Likes

Are you seeking the tokenizer?

https://platform.openai.com/tokenizer

text

I will give you the name of a common product found in a US grocery store.  I would like you to tell me the dish most commonly consumed in the US that contains this item as its featured ingredient.  If possible, ensure the dish is one that most would consider to be 'American cuisine.'   Please restrict your answer to one item, and include only the name of the dish.

The item is:

Boneless skinless chicken thighs

Tokens

Token ids
[40, 481, 1577, 345, 262, 1438, 286, 257, 2219, 1720, 1043, 287, 257, 1294, 16918, 3650, 13, 220, 314, 561, 588, 345, 284, 1560, 502, 262, 9433, 749, 8811, 13529, 287, 262, 1294, 326, 4909, 428, 2378, 355, 663, 8096, 18734, 13, 220, 1002, 1744, 11, 4155, 262, 9433, 318, 530, 326, 749, 561, 2074, 284, 307, 705, 7437, 33072, 2637, 220, 220, 4222, 4239, 534, 3280, 284, 530, 2378, 11, 290, 2291, 691, 262, 1438, 286, 262, 9433, 13, 198, 198, 464, 2378, 318, 25, 198, 198, 20682, 5321, 4168, 1203, 9015, 30389]

2 Likes


2023-06-19 at 8.58 PM
2023-06-19 at 8.59 PM

I didn’t get the same “Barbecue Chicken Thighs” response, but you get the idea. You can use Knit to test the prompt and it provides the token/cost analytics you want. Disclaimer, I built Knit and it’s currently free for everyone :slight_smile:

2 Likes

Hi @cagross

The response to every request contains a usage object. You can use response.usage to know the tokens consumed.

Here’s how the complete response looks like:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}
8 Likes

There’s a newer package called “tiktoken” that does exactly this. I have no relationship with them and have not used it (yet) myself, but maybe it’s just what you need? It has 5.4k github stars.

1 Like

If you took the time to read the link in my post it notes

If you need a programmatic interface for tokenizing text, check out our tiktoken package for Python.

This sentence doesn’t appear anywhere in this thread for me, sorry mate.

1 Like

OK thanks all.

I think the Tokenizer should be suitable for this purpose.

But to be clear, let’s say I post a question to ChatGPT (in the ‘User’ message), and it replies with a response (in the ‘Assistant’ message). The total number of tokens used is then equal to # of tokens in User message + # of tokens in Assistant message. Is that correct?

Or does the ‘System’ message alter the number of tokens used during a single request?

@sps Thanks for info about response.usage. But is there a way to access that object if I’m simply using the OpenAI Playground? When I make the request, in the dev tools network tab, I can see the fetch request is returning an event stream (screenshot). I’ve never encountered one of these before–how can I get the response object from this? Or is response.usage obtained another way (maybe only programmatically)?

System, User, and Assistant/Response are all included in token count/costs

2 Likes

This tokenizer was used by earlier gpt-3 models and the codex one was used by codex series of models.

gpt-3.5 and gpt-4 use tiktoken. Here’s a sample on how to use it

In my knowledge the playground UI doesn’t show the whole response object. So the response.usage isn’t accessible via playground.

If you want to track usage via OpenAI’s own UI you can go to the usage page. The data updation takes some time though.

Prompt tokens is = the token count for the data sent by you when making the API call.
completion tokens = the tokens generated by the model.

2 Likes

Great thanks to both. For now, I’ll try to use the usage page per the suggestion by @sps.

hey community,
You can use this python function to report the total costs

def openai_api_calculate_cost(usage,model="gpt-3.5-turbo-16k"):
    pricing = {
        'gpt-3.5-turbo-4k': {
            'prompt': 0.0015,
            'completion': 0.002,
        },
        'gpt-3.5-turbo-16k': {
            'prompt': 0.003,
            'completion': 0.004,
        },
        'gpt-4-8k': {
            'prompt': 0.03,
            'completion': 0.06,
        },
        'gpt-4-32k': {
            'prompt': 0.06,
            'completion': 0.12,
        },
        'text-embedding-ada-002-v2': {
            'prompt': 0.0001,
            'completion': 0.0001,
        }
    }

    try:
        model_pricing = pricing[model]
    except KeyError:
        raise ValueError("Invalid model specified")

    prompt_cost = usage['prompt_tokens'] * model_pricing['prompt'] / 1000
    completion_cost = usage['completion_tokens'] * model_pricing['completion'] / 1000

    total_cost = prompt_cost + completion_cost
    print(f"\nTokens used:  {usage['prompt_tokens']:,} prompt + {usage['completion_tokens']:,} completion = {usage['total_tokens']:,} tokens")
    print(f"Total cost for {model}: ${total_cost:.4f}\n")

    return total_cost
4 Likes

Here is an updated version of the cost funtion by curioustoknownow, for the 1.3 API, also with updated prices (as of 2023-12-26) but only for the models I use myself. you can add the rest quite easily, if you do add all please reply here with an updated ‘pricing’ object for us others! :slight_smile:

Also added a rounded cost as output.


def openai_api_calculate_cost(usage,model="gpt-4-1106-preview"):
    pricing = {
        'gpt-3.5-turbo-1106': {
            'prompt': 0.001,
            'completion': 0.002,
        },
        'gpt-4-1106-preview': {
            'prompt': 0.01,
            'completion': 0.03,
        },
        'gpt-4': {
            'prompt': 0.03,
            'completion': 0.06,
        }
    }

    try:
        model_pricing = pricing[model]
    except KeyError:
        raise ValueError("Invalid model specified")

    prompt_cost = usage.prompt_tokens * model_pricing['prompt'] / 1000
    completion_cost = usage.completion_tokens * model_pricing['completion'] / 1000

    total_cost = prompt_cost + completion_cost
    # round to 6 decimals
    total_cost = round(total_cost, 6)

    print(f"\nTokens used:  {usage.prompt_tokens:,} prompt + {usage.completion_tokens:,} completion = {usage.total_tokens:,} tokens")
    print(f"Total cost for {model}: ${total_cost:.4f}\n")

    return total_cost






5 Likes

Calculating cost for GPT 4o-mini.
Prompt and data will go into the api call and response would be the output from it. The total cost incurred will amount to -

## !pip install tiktoken
import tiktoken

# Initialize the tokenizer for the GPT model
tokenizer = tiktoken.encoding_for_model("gpt-4o-mini")  

# request and response
request = str(prompt) + str(data)
response = str(out)

# Tokenize 
request_tokens = tokenizer.encode(request)
response_tokens = tokenizer.encode(response)

# Counting the total tokens for request and response separately
input_tokens = len(request_tokens)
output_tokens = len(response_tokens)

# Actual costs per 1 million tokens
cost_per_1M_input_tokens = 0.15  # $0.150 per 1M input tokens
cost_per_1M_output_tokens = 0.60  # $0.600 per 1M output tokens

# Calculate the costs
input_cost = (input_tokens / 10**6) * cost_per_1M_input_tokens
output_cost = (output_tokens / 10**6) * cost_per_1M_output_tokens
total_cost = input_cost + output_cost

print(f"Input tokens: {input_tokens}")
print(f"Output tokens: {output_tokens}")
print(f"Total tokens: {input_tokens + output_tokens}")
print(f"Cost: ${total_cost:.5f}")