Getting around "max_tokens"

The max_tokens parameter is a bit of a pain, in the sense that you need to know the number of tokens in your prompt, so as not to ask for more than 2049 tokens.

Is there any solution to allow the API to just stop when it gets to 2049 tokens, and not specifying max_tokens? Loading GPT2 tokenizer just to find number of tokens in the text seems like an overkill for this. Since response has ‘stop reason’, I’d expect there’s some workaround.

Thank you,

1 Like

There is no way to increase max tokens but here are some posts about creating longer completions.

Also based on what your saying it seems like you dont need 2048 tokens so maybe just decrease it past what your prompt will be?

1 Like

Thank you, for the answer and references!
I don’t actually want to increase the number of tokens. I’m just asking if there’s an API solution to gracefully return from a request where you accidentally request more tokens than the engine can do. Instead of exiting with an exception.
API currently “makes you” select a max_number of tokens, and since my prompt lengths are varying, it’s something i’d like to not compute on the fly every time.

Hi @alex_g

Programmatically counting the number of tokens and then setting max_tokens seems like the only way to go for now.

Also, when you say ‘gracefully’, it sounds like this is more of an error handling problem than an API one.


Hello, Everyone
I am also facing like @alex_g problem.
My max token is 2048. but responsive text’s token is 240~300.
When I check the responsive, finish reason is “stop”.
is there any solution for increate the tokens? I want only get max_token(2049) by one api request.
If you know any solution, please help me
Thanks in advacne.