The max_tokens parameter is a bit of a pain, in the sense that you need to know the number of tokens in your prompt, so as not to ask for more than 2049 tokens.
Is there any solution to allow the API to just stop when it gets to 2049 tokens, and not specifying max_tokens? Loading GPT2 tokenizer just to find number of tokens in the text seems like an overkill for this. Since response has ‘stop reason’, I’d expect there’s some workaround.
Thank you, for the answer and references!
I don’t actually want to increase the number of tokens. I’m just asking if there’s an API solution to gracefully return from a request where you accidentally request more tokens than the engine can do. Instead of exiting with an exception.
API currently “makes you” select a max_number of tokens, and since my prompt lengths are varying, it’s something i’d like to not compute on the fly every time.