Getting around "max_tokens"

alex_g · May 15, 2022, 5:42am

The max_tokens parameter is a bit of a pain, in the sense that you need to know the number of tokens in your prompt, so as not to ask for more than 2049 tokens.

Is there any solution to allow the API to just stop when it gets to 2049 tokens, and not specifying max_tokens? Loading GPT2 tokenizer just to find number of tokens in the text seems like an overkill for this. Since response has ‘stop reason’, I’d expect there’s some workaround.

Thank you,
Alex

SecMovPuz · May 16, 2022, 3:06pm

There is no way to increase max tokens but here are some posts about creating longer completions.

Also based on what your saying it seems like you dont need 2048 tokens so maybe just decrease it past what your prompt will be?

alex_g · May 16, 2022, 3:44pm

Thank you, for the answer and references!
I don’t actually want to increase the number of tokens. I’m just asking if there’s an API solution to gracefully return from a request where you accidentally request more tokens than the engine can do. Instead of exiting with an exception.
API currently “makes you” select a max_number of tokens, and since my prompt lengths are varying, it’s something i’d like to not compute on the fly every time.

sps · May 16, 2022, 5:53pm

Hi @alex_g

Programmatically counting the number of tokens and then setting max_tokens seems like the only way to go for now.

Also, when you say ‘gracefully’, it sounds like this is more of an error handling problem than an API one.

david_bcn997 · September 21, 2022, 3:08am

Hello, Everyone
I am also facing like @alex_g problem.
My max token is 2048. but responsive text’s token is 240~300.
When I check the responsive, finish reason is “stop”.
is there any solution for increate the tokens? I want only get max_token(2049) by one api request.
If you know any solution, please help me
Thanks in advacne.

alex_g · November 20, 2022, 9:44am

I ended up running the fast gpt2 tokenizer, from transformers library.

from transformers import AutoTokenizer
gpt2_tokenizer = AutoTokenizer.from_pretrained("gpt2", use_fast=True)

text_in = "bla bla"
tokens = gpt2_tokenizer.tokenize(text_in)

brightj · April 16, 2023, 10:57pm

Shouldn’t you use tiktoken? Is there some way to get the token limit for a particular model through the API itself instead of hardcoding it?

francomachinelearnin · August 20, 2023, 8:43am

If you want to use a bigger context window an option is to divide the context in chunks, do multiple api calls, and the sum all the answers in one. You can define a function to do it manually, or you can use a library like langchain to handle the process for you.

def chunker(seq, size):
return (seq[pos:pos + size] for pos in range(0, len(seq), size))

def generate_exercises(prompt, model=modelo, max_length=2048):
global api_calls_count
chunks = list(chunker(prompt, max_length))
responses =

for chunk in chunks:
    response = openai.ChatCompletion.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": chunk}
        ]
    )
    responses.append(response.choices[0].message.content)
    api_calls_count += 1

return "".join(responses)

Topic		Replies	Views
API token limitation differs from website UI token limitation API	4	569	December 18, 2023
Question regarding max_tokens Prompting	11	35733	December 13, 2023
How can I adjust the length of the prompt so that it does not exceed the max tokens? API api	4	3214	December 18, 2023
Token Limitization Error when prompting Prompting chatgpt , api	8	2784	December 6, 2023
Chained Prompt to complete text larger than 4000 tokens? API	14	5729	December 25, 2023

Getting around "max_tokens"

Related topics