How to calculate the tokens when using function call

I have played with function calling for a few months now I guess, and I can tell for sure that output passed to arguments is not validated against the definition I have seen so many times, the arguments being just plain text which was supposed to be in content, sometimes some simple text instead of arguments like The color is #ff0000, width is 2px instead of actual JSON.

P.S. Function names are also not protected, so many times I am getting random function names it decided it exists

ok, so this means we can provide descriptions as long as I want in the function JSON, and it will not waste the token limit of the api? that’s really great news.

The descriptions are text that is inserted into the AI language that allow it to understand what the function is for. Likewise, AI will also receive the text of function names and property names.

The actual text received by the AI is what costs you the tokens. So no, you are still charged for descriptions, which consume context length.

You can also misuse descriptions in higher nesting levels, and then the AI never receives them.

This thread is about the exact and accurate method of calculating not just the sent language, but the overhead of using functions.

The description for the function parameter is charged by openAI. It need to be clear but as short as you can. :grinning:

Made a python version of hmarrs typescript program. Was looking for one myself so thought it might come in handy for some!

3 Likes

Ultimately, the only way to correctly count tokens is to render the full system message as the AI receives it. Only then will you have a container that is not affected by what surrounds it (because there is no joining with the special non-dictionary tokens enclosing a message.)

あなたは数学が好きな役立つアシスタントです

# Tools

## functions

namespace functions {

// Description of example function
type function_name = (_: {
// description of function property 1: string (required)
property1: string,
// description of function property 2: string w enum
property2?: "enum_yes" | "enum_no",
}) => any;

} // namespace functions

Take that Japanese system prompt. The last token of the system prompt is です Then the next token that must be produced is two carriage returns (\n\n)

If instead we put a Japanese period at the end of the line (。) then the tokenizer will use the token for (。\n\n) - a different system prompt doesn’t need a (\n\n) token to advance down to the next line of the tools description.

We can get a different token at the end of the system prompt than if we hadn’t included a function. When using functions, most system prompts will get the two line feeds “for free”, but others won’t.

Also note that the encoder must use the “required” property, which removes the question mark. of properties.

Best way and most fail-proof we implemented by now is to make a fake request to the model with “empty” messages but including functions and other parameters (still, to be calculated as tokens). Returned value <response>['usage']['prompt_tokens'] is the number you’re looking for. It’s probably (it definitely is) not the best practice, but it gets the best results

3 Likes

Yeah its the most accurate but a complete waste of tokens. Not scaleable at all

1 Like

Yes, definitely. It only works when it’s not you who’s paying the bill. But now we’re being accurate at about ±5 tk in the total token calculation

Dude if you’re using python try this out, have been 100% accurate during all my testing. Use the estimate_tokens() func. (python version of hmarr’s ts code).

import tiktoken


ENCODING = tiktoken.get_encoding("cl100k_base")


def _format_function_definitions(functions: list[dict]) -> str:
    """
    Generates TypeScript function type definitions.

    Args:
    - functions (list[dict]): List of dictionaries representing function definitions.

    Returns:
    - str: TypeScript function type definitions.
    """
    lines = ["namespace functions {"]

    for func in functions:
        if func.get("description"):
            lines.append(f"// {func['description']}")

        if func["parameters"].get("properties"):
            lines.append(f"type {func['name']} = (_: {{")
            lines.append(_format_object_properties(func["parameters"], 0))
            lines.append("}) => any;")
        else:
            lines.append(f"type {func['name']} = () => any;")

        lines.append("")

    lines.append("} // namespace functions")
    return "\n".join(lines)


def _format_object_properties(parameters: dict, indent: int) -> str:
    """
    Formats object properties for TypeScript type definitions.

    Args:
    - parameters (dict): Dictionary representing object parameters.
    - indent (int): Number of spaces for indentation.

    Returns:
    - str: Formatted object properties.
    """
    lines = []
    for name, param in parameters["properties"].items():
        if param.get("description") and indent < 2:
            lines.append(f"// {param['description']}")

        is_required = parameters.get("required") and name in parameters["required"]
        lines.append(
            f"{name}{'?:' if not is_required else ':'} {_format_type(param, indent)},"
        )

    return "\n".join([" " * indent + line for line in lines])


def _format_type(param: dict, indent: int) -> str:
    """
    Formats a single property type for TypeScript type definitions.

    Args:
    - param (dict): Dictionary representing a parameter.
    - indent (int): Number of spaces for indentation.

    Returns:
    - str: Formatted type for the given parameter.
    """
    type_ = param["type"]
    if type_ == "string":
        return (
            " | ".join([f'"{v}"' for v in param["enum"]])
            if param.get("enum")
            else "string"
        )
    elif type_ == "number":
        return (
            " | ".join([str(v) for v in param["enum"]])
            if param.get("enum")
            else "number"
        )
    elif type_ == "integer":
        return (
            " | ".join([str(v) for v in param["enum"]])
            if param.get("enum")
            else "integer"
        )
    elif type_ == "array":
        return (
            f"{_format_type(param['items'], indent)}[]"
            if param.get("items")
            else "any[]"
        )
    elif type_ == "boolean":
        return "boolean"
    elif type_ == "null":
        return "null"
    elif type_ == "object":
        return "{\n" + _format_object_properties(param, indent + 2) + "\n}"
    else:
        raise ValueError(f"Unsupported type: {type_}")


def _estimate_function_tokens(functions: list[dict]) -> int:
    """
    Estimates token count for a given list of functions.

    Args:
    - functions (list[dict]): List of dictionaries representing function definitions.

    Returns:
    - int: Estimated token count.
    """
    prompt_definitions = _format_function_definitions(functions)
    tokens = _string_tokens(prompt_definitions)
    tokens += 9  # Add nine per completion
    return tokens


def _string_tokens(string: str) -> int:
    """
    Estimates token count for a given string using 'cl100k_base' encoding.

    Args:
    - string (str): Input string.

    Returns:
    - int: Estimated token count.
    """
    global ENCODING
    return len(ENCODING.encode(string))


def _estimate_message_tokens(message: dict) -> int:
    """
    Estimates token count for a given message.

    Args:
    - message (dict): Dictionary representing a message.

    Returns:
    - int: Estimated token count.
    """
    components = [
        message.get("role"),
        message.get("content"),
        message.get("name"),
        message.get("function_call", {}).get("name"),
        message.get("function_call", {}).get("arguments"),
    ]
    components = [
        component for component in components if component
    ]  # Filter out None values
    tokens = sum([_string_tokens(component) for component in components])

    tokens += 3  # Add three per message
    if message.get("name"):
        tokens += 1
    if message.get("role") == "function":
        tokens -= 2
    if message.get("function_call"):
        tokens += 3

    return tokens


def estimate_tokens(
    messages: list[dict], functions: list[dict] = None, function_call=None
) -> int:
    """
    Estimates token count for a given prompt with messages and functions.

    Args:
    - messages (list[dict]): List of dictionaries representing messages.
    - functions (list[dict], optional): List of dictionaries representing function definitions. Default is None.
    - function_call (str or dict, optional): Function call specification. Default is None.

    Returns:
    - int: Estimated token count.
    """
    padded_system = False
    tokens = 0

    for msg in messages:
        if msg["role"] == "system" and functions and not padded_system:
            modified_message = {"role": msg["role"], "content": msg["content"] + "\n"}
            tokens += _estimate_message_tokens(modified_message)
            padded_system = True  # Mark system as padded
        else:
            tokens += _estimate_message_tokens(msg)

    tokens += 3  # Each completion has a 3-token overhead
    if functions:
        tokens += _estimate_function_tokens(functions)

    if functions and any(m["role"] == "system" for m in messages):
        tokens -= 4  # Adjust for function definitions

    if function_call and function_call != "auto":
        tokens += (
            1 if function_call == "none" else _string_tokens(function_call["name"]) + 4
        )

    return tokens

Hey guys
I created a python package that will count tokens accurately with function support based on hmarr solution. You can see it here:

3 Likes

Thanks for this package. I was able to confirm that it matches with what openai gave me as prompt_tokens.
A quick test can be made by visiting Runkit link under the package’s page on npm and using the following code below and you’ll get an output of 89 which will match your prompt_tokens from openai if you were to the same values in your chat completion input

var {promptTokensEstimate} = require("openai-chat-tokens")


const estimate = promptTokensEstimate({
  messages: [
    {"role": "user", "content": "What is the weather like in Boston?"}
  ],
  function_call: { name: "get_current_weather" },
  functions: [
    {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
  ]
});

console.log(estimate)

Has anyone figured out how to count the prompt tokens using the new tools / tool_choice fields?

2 Likes

I’ve tried and found that it is the same as function_call.

And I changed the method to calculate tokens because the new model gpt-4-1106-vision-preview support image as parameter. You can find the calculation method in Github File In Java.

Let me know if I have any issue in my code.

I’ve updated the calculation method after 1106 updated (function → tool).

You can get the method as below.

FunctionFormat.java

Also happy that they’ve made a lot of improvements to the parser since the last update. It now supports descriptions for nested properties, and uses an up-front “Array<Array<” type definition for arrays instead of the trailing “[][]” which is much better for the model to understand what structure it’s dealing with in regards to arrays and especially nested arrays or objects in arrays.

I’ve thankfully been able to deprecate my own parser for any other purpose than to just count tokens, as opposed to using it to fix these issues by injecting my own generated descriptions.

I don’t know if I was wrong.

I just test tool calls through my SDK, and I found that the prompt of usage from OpenAI server seems that it does’t contains the tools in the request.

Anyone has the same situation with me?

Thanks to abhatia-07 to point the issue when I calculate tools token. Refer to Token Count Difference #4.

Already fixed it and make it calculating more accurately with OpenAI API in java language.

1 Like