Fixed prompt token limit exceeded error for long conversation gpt-3 and gpt-4

:exclamation::exclamation::exclamation: Note: This is a solution not only an issue. :exclamation::exclamation::exclamation:
Hi, Everyone
I faced one very critical issue token limitation exceeding issue for long conversations with different languages (Chines, English, Russian, Korean,…, etc). I’m trying to find and I have found many solutions but did not get an appropriate answer. As you know we need previous messages to make a context of the chat like the example below.

        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello"}

openai API generates a new response and we can add it to the list like this:

        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello"},
        {"role": "assistant", "content": "Hello, how can I help you?"},

and so on, but the issue is when we have a large number of messages you’ll get a token limitation error because the messages list has too many messages so what we can do now? Are you thinking about changing the model (like gpt-4-32k-0613 or gpt-3.5-turbo-16k-0613)? If yes you are wrong because these models also have limited so at some point these models also give you an error that you have exceeded the token limit.
So what we can do now?
We can calculate the tokens of the messages list before passing it to the openai API if the tokens and greater than our current model token limit we can simply remove the very first messages from the messages list.

This code gives the length of the token so you can easily check the token limit and modify your messages list accordingly.

pip install tiktoken

import tiktoken

def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
    """Return the number of tokens used by a list of messages."""
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")
    if model in {
        tokens_per_message = 3
        tokens_per_name = 1
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
        raise NotImplementedError(
            f"""num_tokens_from_messages() is not implemented for model {model}. See for information on how messages are converted to tokens."""
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens

If you still have an issue feel free to ask me.
My LinkedIn user name: aliahmadjakhar

Note, this post just shows us boilerplate code from the OpenAI cookbook:

It totals all tokens in a list.

It does not provide the backend solution you’d need, such as recording the number of tokens for each question and reply, determining the context length of the present model, considering the amount already used by max_tokens for the response, the user input, overhead of role message formatting, and the elusive amount consumed by functions.

One then actually needs the function that assembles chat history that fits in that remaining space.

yes, the user needs to modify a code a bit and the user can fix the messages list token limit issue.

here is the modification user needs to do:

while num_tokens > "your-model-token-limit" and len(message_list) > 0:
        removed_message_token, _ = message_list.pop(0)  # remove the oldest message
        num_tokens -= removed_message_token
remaining_messages = [message for _, message in message_list]
1 Like