FWIW, I’ve validated that hmarr’s method is the most accurate and it validates itself against OpenAI’s API which is “the right way” to do all this.
Big kudos since none of this is really documented, even within OpenAI’s cookbook code
FWIW, I’ve validated that hmarr’s method is the most accurate and it validates itself against OpenAI’s API which is “the right way” to do all this.
Big kudos since none of this is really documented, even within OpenAI’s cookbook code
I have played with function calling for a few months now I guess, and I can tell for sure that output passed to arguments is not validated against the definition
I have seen so many times, the arguments being just plain text which was supposed to be in content, sometimes some simple text instead of arguments like The color is #ff0000, width is 2px
instead of actual JSON.
P.S. Function names are also not protected, so many times I am getting random function names it decided it exists
ok, so this means we can provide descriptions as long as I want in the function JSON, and it will not waste the token limit of the api? that’s really great news.
The descriptions are text that is inserted into the AI language that allow it to understand what the function is for. Likewise, AI will also receive the text of function names and property names.
The actual text received by the AI is what costs you the tokens. So no, you are still charged for descriptions, which consume context length.
You can also misuse descriptions in higher nesting levels, and then the AI never receives them.
This thread is about the exact and accurate method of calculating not just the sent language, but the overhead of using functions.
The description for the function parameter is charged by openAI. It need to be clear but as short as you can.
Made a python version of hmarrs typescript program. Was looking for one myself so thought it might come in handy for some!
Ultimately, the only way to correctly count tokens is to render the full system message as the AI receives it. Only then will you have a container that is not affected by what surrounds it (because there is no joining with the special non-dictionary tokens enclosing a message.)
あなたは数学が好きな役立つアシスタントです
# Tools
## functions
namespace functions {
// Description of example function
type function_name = (_: {
// description of function property 1: string (required)
property1: string,
// description of function property 2: string w enum
property2?: "enum_yes" | "enum_no",
}) => any;
} // namespace functions
Take that Japanese system prompt. The last token of the system prompt is です Then the next token that must be produced is two carriage returns (\n\n)
If instead we put a Japanese period at the end of the line (。) then the tokenizer will use the token for (。\n\n) - a different system prompt doesn’t need a (\n\n) token to advance down to the next line of the tools description.
We can get a different token at the end of the system prompt than if we hadn’t included a function. When using functions, most system prompts will get the two line feeds “for free”, but others won’t.
Also note that the encoder must use the “required” property, which removes the question mark. of properties.
Best way and most fail-proof we implemented by now is to make a fake request to the model with “empty” messages
but including functions
and other parameters (still, to be calculated as tokens). Returned value <response>['usage']['prompt_tokens']
is the number you’re looking for. It’s probably (it definitely is) not the best practice, but it gets the best results
Yeah its the most accurate but a complete waste of tokens. Not scaleable at all
Yes, definitely. It only works when it’s not you who’s paying the bill. But now we’re being accurate at about ±5 tk in the total token calculation
Dude if you’re using python try this out, have been 100% accurate during all my testing. Use the estimate_tokens() func. (python version of hmarr’s ts code).
import tiktoken
ENCODING = tiktoken.get_encoding("cl100k_base")
def _format_function_definitions(functions: list[dict]) -> str:
"""
Generates TypeScript function type definitions.
Args:
- functions (list[dict]): List of dictionaries representing function definitions.
Returns:
- str: TypeScript function type definitions.
"""
lines = ["namespace functions {"]
for func in functions:
if func.get("description"):
lines.append(f"// {func['description']}")
if func["parameters"].get("properties"):
lines.append(f"type {func['name']} = (_: {{")
lines.append(_format_object_properties(func["parameters"], 0))
lines.append("}) => any;")
else:
lines.append(f"type {func['name']} = () => any;")
lines.append("")
lines.append("} // namespace functions")
return "\n".join(lines)
def _format_object_properties(parameters: dict, indent: int) -> str:
"""
Formats object properties for TypeScript type definitions.
Args:
- parameters (dict): Dictionary representing object parameters.
- indent (int): Number of spaces for indentation.
Returns:
- str: Formatted object properties.
"""
lines = []
for name, param in parameters["properties"].items():
if param.get("description") and indent < 2:
lines.append(f"// {param['description']}")
is_required = parameters.get("required") and name in parameters["required"]
lines.append(
f"{name}{'?:' if not is_required else ':'} {_format_type(param, indent)},"
)
return "\n".join([" " * indent + line for line in lines])
def _format_type(param: dict, indent: int) -> str:
"""
Formats a single property type for TypeScript type definitions.
Args:
- param (dict): Dictionary representing a parameter.
- indent (int): Number of spaces for indentation.
Returns:
- str: Formatted type for the given parameter.
"""
type_ = param["type"]
if type_ == "string":
return (
" | ".join([f'"{v}"' for v in param["enum"]])
if param.get("enum")
else "string"
)
elif type_ == "number":
return (
" | ".join([str(v) for v in param["enum"]])
if param.get("enum")
else "number"
)
elif type_ == "integer":
return (
" | ".join([str(v) for v in param["enum"]])
if param.get("enum")
else "integer"
)
elif type_ == "array":
return (
f"{_format_type(param['items'], indent)}[]"
if param.get("items")
else "any[]"
)
elif type_ == "boolean":
return "boolean"
elif type_ == "null":
return "null"
elif type_ == "object":
return "{\n" + _format_object_properties(param, indent + 2) + "\n}"
else:
raise ValueError(f"Unsupported type: {type_}")
def _estimate_function_tokens(functions: list[dict]) -> int:
"""
Estimates token count for a given list of functions.
Args:
- functions (list[dict]): List of dictionaries representing function definitions.
Returns:
- int: Estimated token count.
"""
prompt_definitions = _format_function_definitions(functions)
tokens = _string_tokens(prompt_definitions)
tokens += 9 # Add nine per completion
return tokens
def _string_tokens(string: str) -> int:
"""
Estimates token count for a given string using 'cl100k_base' encoding.
Args:
- string (str): Input string.
Returns:
- int: Estimated token count.
"""
global ENCODING
return len(ENCODING.encode(string))
def _estimate_message_tokens(message: dict) -> int:
"""
Estimates token count for a given message.
Args:
- message (dict): Dictionary representing a message.
Returns:
- int: Estimated token count.
"""
components = [
message.get("role"),
message.get("content"),
message.get("name"),
message.get("function_call", {}).get("name"),
message.get("function_call", {}).get("arguments"),
]
components = [
component for component in components if component
] # Filter out None values
tokens = sum([_string_tokens(component) for component in components])
tokens += 3 # Add three per message
if message.get("name"):
tokens += 1
if message.get("role") == "function":
tokens -= 2
if message.get("function_call"):
tokens += 3
return tokens
def estimate_tokens(
messages: list[dict], functions: list[dict] = None, function_call=None
) -> int:
"""
Estimates token count for a given prompt with messages and functions.
Args:
- messages (list[dict]): List of dictionaries representing messages.
- functions (list[dict], optional): List of dictionaries representing function definitions. Default is None.
- function_call (str or dict, optional): Function call specification. Default is None.
Returns:
- int: Estimated token count.
"""
padded_system = False
tokens = 0
for msg in messages:
if msg["role"] == "system" and functions and not padded_system:
modified_message = {"role": msg["role"], "content": msg["content"] + "\n"}
tokens += _estimate_message_tokens(modified_message)
padded_system = True # Mark system as padded
else:
tokens += _estimate_message_tokens(msg)
tokens += 3 # Each completion has a 3-token overhead
if functions:
tokens += _estimate_function_tokens(functions)
if functions and any(m["role"] == "system" for m in messages):
tokens -= 4 # Adjust for function definitions
if function_call and function_call != "auto":
tokens += (
1 if function_call == "none" else _string_tokens(function_call["name"]) + 4
)
return tokens
This post was flagged by the community and is temporarily hidden.
Hey guys
I created a python package that will count tokens accurately with function support based on hmarr solution. You can see it here:
Thanks for this package. I was able to confirm that it matches with what openai gave me as prompt_tokens.
A quick test can be made by visiting Runkit link under the package’s page on npm and using the following code below and you’ll get an output of 89 which will match your prompt_tokens
from openai if you were to the same values in your chat completion input
var {promptTokensEstimate} = require("openai-chat-tokens")
const estimate = promptTokensEstimate({
messages: [
{"role": "user", "content": "What is the weather like in Boston?"}
],
function_call: { name: "get_current_weather" },
functions: [
{
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
]
});
console.log(estimate)