Note: This is a solution not only an issue.
Hi, Everyone
I faced one very critical issue token limitation exceeding issue for long conversations with different languages (Chines, English, Russian, Korean,…, etc). I’m trying to find and I have found many solutions but did not get an appropriate answer. As you know we need previous messages to make a context of the chat like the example below.
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello"}
]
openai API generates a new response and we can add it to the list like this:
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hello, how can I help you?"},
]
and so on, but the issue is when we have a large number of messages you’ll get a token limitation error because the messages list has too many messages so what we can do now? Are you thinking about changing the model (like gpt-4-32k-0613 or gpt-3.5-turbo-16k-0613)? If yes you are wrong because these models also have limited so at some point these models also give you an error that you have exceeded the token limit.
So what we can do now?
We can calculate the tokens of the messages list before passing it to the openai API if the tokens and greater than our current model token limit we can simply remove the very first messages from the messages list.
This code gives the length of the token so you can easily check the token limit and modify your messages list accordingly.
pip install tiktoken
import tiktoken
def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
"""Return the number of tokens used by a list of messages."""
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
print("Warning: model not found. Using cl100k_base encoding.")
encoding = tiktoken.get_encoding("cl100k_base")
if model in {
"gpt-3.5-turbo-0613",
"gpt-3.5-turbo-16k-0613",
"gpt-4-0314",
"gpt-4-32k-0314",
"gpt-4-0613",
"gpt-4-32k-0613",
}:
tokens_per_message = 3
tokens_per_name = 1
elif model == "gpt-3.5-turbo-0301":
tokens_per_message = 4 # every message follows <|start|>{role/name}\n{content}<|end|>\n
tokens_per_name = -1 # if there's a name, the role is omitted
elif "gpt-3.5-turbo" in model:
print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
elif "gpt-4" in model:
print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
return num_tokens_from_messages(messages, model="gpt-4-0613")
else:
raise NotImplementedError(
f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens."""
)
num_tokens = 0
for message in messages:
num_tokens += tokens_per_message
for key, value in message.items():
num_tokens += len(encoding.encode(value))
if key == "name":
num_tokens += tokens_per_name
num_tokens += 3 # every reply is primed with <|start|>assistant<|message|>
return num_tokens
If you still have an issue feel free to ask me.
My LinkedIn user name: aliahmadjakhar