The max_tokens parameter is a bit of a pain, in the sense that you need to know the number of tokens in your prompt, so as not to ask for more than 2049 tokens.
Is there any solution to allow the API to just stop when it gets to 2049 tokens, and not specifying max_tokens? Loading GPT2 tokenizer just to find number of tokens in the text seems like an overkill for this. Since response has ‘stop reason’, I’d expect there’s some workaround.
Thank you, for the answer and references!
I don’t actually want to increase the number of tokens. I’m just asking if there’s an API solution to gracefully return from a request where you accidentally request more tokens than the engine can do. Instead of exiting with an exception.
API currently “makes you” select a max_number of tokens, and since my prompt lengths are varying, it’s something i’d like to not compute on the fly every time.
Hello, Everyone
I am also facing like @alex_g problem.
My max token is 2048. but responsive text’s token is 240~300.
When I check the responsive, finish reason is “stop”.
is there any solution for increate the tokens? I want only get max_token(2049) by one api request.
If you know any solution, please help me
Thanks in advacne.
If you want to use a bigger context window an option is to divide the context in chunks, do multiple api calls, and the sum all the answers in one. You can define a function to do it manually, or you can use a library like langchain to handle the process for you.
def chunker(seq, size):
return (seq[pos:pos + size] for pos in range(0, len(seq), size))