How can I adjust the length of the prompt so that it does not exceed the max tokens?

When some prompts exceed the max of tokens I receive an error, this happen with “chat” and “completion” endpoints, so how to be sure that the prompt will not exceed the max tokens?

1 Like


Typically you make use of the TikToken tokenizer to check how many tokens your prompt string is, then you add on a fixed amount for the internal workings, say 50 tokens worth and you should have an accurate measure of your prompt size.

from tiktoken import get_encoding

tokenizer = get_encoding("cl100k_base")

def tokenize():
    text = request.json['text']
    tokenized_text = tokenizer.encode(text)
    tokenized_text = [{'token': token, 'text': tokenizer.decode([token])} for token in tokenized_text]
    return jsonify(tokenized_text=tokenized_text)
1 Like

Another “hack” is to embed the data using ada-002, which uses the cl100k_base tokenizer. It will return the tokens used (and the embedding vector). Useful if you already plan on embedding.

If not using embeddings, its still a cheap and lightweight way to go (can be done with the API, using only requests to the API endpoint, without additional OpenAI libraries)


A typical scenario is that the chat user input reduces the number of past conversation turns you can pass.

Track each turn’s token use, and you know when adding the input and most recent turns in reverse order will exceed the input budget.

“Your input exceeded the maximum of 6400 tokens”
“Your huge prompt made us discard all but the last two questions”

Auto-adapting the max_tokens a bit is more sketchy, because you don’t want question 20 to be less fulfilling than question 1 by being accidentally chopped off.