I’d like to accurately tokenize requests with function calls. I’m running into errors about context limits, and I’d like to automatically switch between models based on context limits. How can I do this in Python?
Accurately tokenize: count tokens.
Count tokens of each conversation turn
Count tokens of data augmentation
Count tokens of system prompt
Count tokens of user input
Cont tokens of API functions
Consider tokens needed for response.
You can probably then avoid the switch to a model twice as expensive by that knowledge and management of what you are sending.
tiktoken is a fast open-source tokenizer by OpenAI.
Given a text string (e.g.,
"tiktoken is great!") and an encoding (e.g.,
"cl100k_base"), a tokenizer can split the text string into a list of tokens (e.g.,
["t", "ik", "token", " is", " great", "!"]).
Splitting text strings into tokens is useful because GPT models see text in the form of tokens. Knowing how many tokens are in a text string can tell you (a) whether the string is too long for a text model to process and (b) how much an OpenAI API call costs (as usage is priced by token).
I have already been using tiktoken to count tokens, but it hasn’t been working for me. How can I do this with API functions?
Are you using it’s ability to count tokens, seen in OpenAI cookbook examples and on github documentation. Does it count tokens of text correctly in your implementation?
If so, you just need to manage what you send, by improving the metadata of any type of information sent, and basing sending on counting tokens.
For example, when you have stored the token size of every user input prompt and every AI response alongside every entry in conversation history, you can use that information to decide how many past turns will fit into a limited space that you’ve allocated for chat history.
I already do this, and it worked perfectly without functions.
Then you can do two things:
- Measure the size of the functions. Call twice with the same input but with and without the function included. See the difference in input tokens in the API response.
- Count the tokens that go in the “function” role message where the function return is then inserted in the prior conversation and input.
The actual solution is quite a bit more complex. Turns out OpenAI converts the function definations to Typescript functions in the backend which is what makes up the token usage. Figured out by a user named hmarr.
Anyway if you are using the python api I have made python version of hmarrs original typescript code. GitHub - Reversehobo/openai-function-tokens: Predict the exact openai token usage of functions
It has returned the exact correct token count for messages and functions every single time during my testing.