The question is whether any solution will count the tokens consumed by the JSON of a functions list — a list of functions that would be sent to the API, reinterpreted, and then passed to the AI in an unseen form only documented by the experimentation and revelation seen earlier in this thread.

There are prior links to some code approximation attempts earlier in the thread, but your tiktoken alone is not that.

Looking at the API code the, json text in the function header is included as a string to the model as would any normal bit of text, so I think if you were to tiktoken that string and add on a few tokens for possible “start of function” “end of function” boundary markers, it should be possible to get an accurate count of the tokens sent, I’ve not gotten round to it, but I did that some time ago for the standard API calls to calculate the additional tokens used by markers and such.

This is an incorrect assumption.

I’ll scroll back for you, where you can see the last code block in my post. The AI-received-language is not the input JSON and not as one might predict: How to calculate the tokens when using function call - #24 by _j

With the secret sauce (jailbreaks I don’t want patched and AI crippled by disclosure into necessarily never following a single instruction) one can replay example arbitrary functions again and again and formulate the pattern. I just don’t have any reason to sit down and code this as I’m not trying to eek out the last of input context length with dynamic functions.

Omit “max_tokens” and all context is yours.

1 Like

Ok, but the API right now is passing “something” to the model via a connection, it’s that “something” we are interested in, correct? It should be trivial to look at the open source API library to see what is being done with a prompt that contains a function element, it should be deterministic in nature and reproducible with tiktoken.

The text is transformed by OpenAI’s internal “function” endpoint after the model selector, load distributor, accounting server paths, other undocumented routing internals. Not any public code.

Here I extract and show the byte string transmitted by the python API module, including a function: Typo in OpenAI API Documentation - #2 by _j

The input is validated against a standard JSON schema, and rejected for malformed requests (“bool” or “float” instead of “boolean” or “number”), but no warning is given when the programmer’s intentions are damaged by the rewriting (omitting keywords like “example”, “maximum”, or discarding any nested descriptions)

Passing a function even selects a differently-trained model than without.

Interesting, I’ll have a wander through the repo when I get time.

Cheers.

FWIW, I’ve validated that hmarr’s method is the most accurate and it validates itself against OpenAI’s API which is “the right way” to do all this.

Big kudos since none of this is really documented, even within OpenAI’s cookbook code

2 Likes

I have played with function calling for a few months now I guess, and I can tell for sure that output passed to arguments is not validated against the definition I have seen so many times, the arguments being just plain text which was supposed to be in content, sometimes some simple text instead of arguments like The color is #ff0000, width is 2px instead of actual JSON.

P.S. Function names are also not protected, so many times I am getting random function names it decided it exists

ok, so this means we can provide descriptions as long as I want in the function JSON, and it will not waste the token limit of the api? that’s really great news.

The descriptions are text that is inserted into the AI language that allow it to understand what the function is for. Likewise, AI will also receive the text of function names and property names.

The actual text received by the AI is what costs you the tokens. So no, you are still charged for descriptions, which consume context length.

You can also misuse descriptions in higher nesting levels, and then the AI never receives them.

This thread is about the exact and accurate method of calculating not just the sent language, but the overhead of using functions.

The description for the function parameter is charged by openAI. It need to be clear but as short as you can. :grinning:

Made a python version of hmarrs typescript program. Was looking for one myself so thought it might come in handy for some!

2 Likes

Ultimately, the only way to correctly count tokens is to render the full system message as the AI receives it. Only then will you have a container that is not affected by what surrounds it (because there is no joining with the special non-dictionary tokens enclosing a message.)

あなたは数学が好きな役立つアシスタントです

# Tools

## functions

namespace functions {

// Description of example function
type function_name = (_: {
// description of function property 1: string (required)
property1: string,
// description of function property 2: string w enum
property2?: "enum_yes" | "enum_no",
}) => any;

} // namespace functions

Take that Japanese system prompt. The last token of the system prompt is です Then the next token that must be produced is two carriage returns (\n\n)

If instead we put a Japanese period at the end of the line (。) then the tokenizer will use the token for (。\n\n) - a different system prompt doesn’t need a (\n\n) token to advance down to the next line of the tools description.

We can get a different token at the end of the system prompt than if we hadn’t included a function. When using functions, most system prompts will get the two line feeds “for free”, but others won’t.

Also note that the encoder must use the “required” property, which removes the question mark. of properties.

Best way and most fail-proof we implemented by now is to make a fake request to the model with “empty” messages but including functions and other parameters (still, to be calculated as tokens). Returned value <response>['usage']['prompt_tokens'] is the number you’re looking for. It’s probably (it definitely is) not the best practice, but it gets the best results

2 Likes

Yeah its the most accurate but a complete waste of tokens. Not scaleable at all

Yes, definitely. It only works when it’s not you who’s paying the bill. But now we’re being accurate at about ±5 tk in the total token calculation

Dude if you’re using python try this out, have been 100% accurate during all my testing. Use the estimate_tokens() func. (python version of hmarr’s ts code).

import tiktoken


ENCODING = tiktoken.get_encoding("cl100k_base")


def _format_function_definitions(functions: list[dict]) -> str:
    """
    Generates TypeScript function type definitions.

    Args:
    - functions (list[dict]): List of dictionaries representing function definitions.

    Returns:
    - str: TypeScript function type definitions.
    """
    lines = ["namespace functions {"]

    for func in functions:
        if func.get("description"):
            lines.append(f"// {func['description']}")

        if func["parameters"].get("properties"):
            lines.append(f"type {func['name']} = (_: {{")
            lines.append(_format_object_properties(func["parameters"], 0))
            lines.append("}) => any;")
        else:
            lines.append(f"type {func['name']} = () => any;")

        lines.append("")

    lines.append("} // namespace functions")
    return "\n".join(lines)


def _format_object_properties(parameters: dict, indent: int) -> str:
    """
    Formats object properties for TypeScript type definitions.

    Args:
    - parameters (dict): Dictionary representing object parameters.
    - indent (int): Number of spaces for indentation.

    Returns:
    - str: Formatted object properties.
    """
    lines = []
    for name, param in parameters["properties"].items():
        if param.get("description") and indent < 2:
            lines.append(f"// {param['description']}")

        is_required = parameters.get("required") and name in parameters["required"]
        lines.append(
            f"{name}{'?:' if not is_required else ':'} {_format_type(param, indent)},"
        )

    return "\n".join([" " * indent + line for line in lines])


def _format_type(param: dict, indent: int) -> str:
    """
    Formats a single property type for TypeScript type definitions.

    Args:
    - param (dict): Dictionary representing a parameter.
    - indent (int): Number of spaces for indentation.

    Returns:
    - str: Formatted type for the given parameter.
    """
    type_ = param["type"]
    if type_ == "string":
        return (
            " | ".join([f'"{v}"' for v in param["enum"]])
            if param.get("enum")
            else "string"
        )
    elif type_ == "number":
        return (
            " | ".join([str(v) for v in param["enum"]])
            if param.get("enum")
            else "number"
        )
    elif type_ == "integer":
        return (
            " | ".join([str(v) for v in param["enum"]])
            if param.get("enum")
            else "integer"
        )
    elif type_ == "array":
        return (
            f"{_format_type(param['items'], indent)}[]"
            if param.get("items")
            else "any[]"
        )
    elif type_ == "boolean":
        return "boolean"
    elif type_ == "null":
        return "null"
    elif type_ == "object":
        return "{\n" + _format_object_properties(param, indent + 2) + "\n}"
    else:
        raise ValueError(f"Unsupported type: {type_}")


def _estimate_function_tokens(functions: list[dict]) -> int:
    """
    Estimates token count for a given list of functions.

    Args:
    - functions (list[dict]): List of dictionaries representing function definitions.

    Returns:
    - int: Estimated token count.
    """
    prompt_definitions = _format_function_definitions(functions)
    tokens = _string_tokens(prompt_definitions)
    tokens += 9  # Add nine per completion
    return tokens


def _string_tokens(string: str) -> int:
    """
    Estimates token count for a given string using 'cl100k_base' encoding.

    Args:
    - string (str): Input string.

    Returns:
    - int: Estimated token count.
    """
    global ENCODING
    return len(ENCODING.encode(string))


def _estimate_message_tokens(message: dict) -> int:
    """
    Estimates token count for a given message.

    Args:
    - message (dict): Dictionary representing a message.

    Returns:
    - int: Estimated token count.
    """
    components = [
        message.get("role"),
        message.get("content"),
        message.get("name"),
        message.get("function_call", {}).get("name"),
        message.get("function_call", {}).get("arguments"),
    ]
    components = [
        component for component in components if component
    ]  # Filter out None values
    tokens = sum([_string_tokens(component) for component in components])

    tokens += 3  # Add three per message
    if message.get("name"):
        tokens += 1
    if message.get("role") == "function":
        tokens -= 2
    if message.get("function_call"):
        tokens += 3

    return tokens


def estimate_tokens(
    messages: list[dict], functions: list[dict] = None, function_call=None
) -> int:
    """
    Estimates token count for a given prompt with messages and functions.

    Args:
    - messages (list[dict]): List of dictionaries representing messages.
    - functions (list[dict], optional): List of dictionaries representing function definitions. Default is None.
    - function_call (str or dict, optional): Function call specification. Default is None.

    Returns:
    - int: Estimated token count.
    """
    padded_system = False
    tokens = 0

    for msg in messages:
        if msg["role"] == "system" and functions and not padded_system:
            modified_message = {"role": msg["role"], "content": msg["content"] + "\n"}
            tokens += _estimate_message_tokens(modified_message)
            padded_system = True  # Mark system as padded
        else:
            tokens += _estimate_message_tokens(msg)

    tokens += 3  # Each completion has a 3-token overhead
    if functions:
        tokens += _estimate_function_tokens(functions)

    if functions and any(m["role"] == "system" for m in messages):
        tokens -= 4  # Adjust for function definitions

    if function_call and function_call != "auto":
        tokens += (
            1 if function_call == "none" else _string_tokens(function_call["name"]) + 4
        )

    return tokens

Hey guys
I created a python package that will count tokens accurately with function support based on hmarr solution. You can see it here:

3 Likes

Thanks for this package. I was able to confirm that it matches with what openai gave me as prompt_tokens.
A quick test can be made by visiting Runkit link under the package’s page on npm and using the following code below and you’ll get an output of 89 which will match your prompt_tokens from openai if you were to the same values in your chat completion input

var {promptTokensEstimate} = require("openai-chat-tokens")


const estimate = promptTokensEstimate({
  messages: [
    {"role": "user", "content": "What is the weather like in Boston?"}
  ],
  function_call: { name: "get_current_weather" },
  functions: [
    {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
  ]
});

console.log(estimate)

Has anyone figured out how to count the prompt tokens using the new tools / tool_choice fields?

1 Like