If I lower the max_tokens
value in my request, does GPT-3 generate shorter but complete texts, or does it just cut off the text when max_tokens
is reached?
It completely depends on the prompt.
Here’s the definition of max_tokens
in API Reference:
The maximum number of tokens to generate in the completion.
The token count of your prompt plus
max_tokens
cannot exceed the model’s context length. Most models have a context length of 2048 tokens (except for the newest models, which support 4096).
Thank you for your answer. I read the documentation about this but unfortunately, it doesn’t answer the question.
I tried it out with very small max_token
values (like 20) and the text is indeed cut off. But I don’t know if that’s also the case for a larger value like 100. Do you have an idea where I could get this question answered? It’s quite important for the functionality of my app.
I asked the support and they clarified that GPT-3 will not attempt to create shorter texts with a smaller max_tokens
value. The text will indeed just be cut off. So in my case, I guess it makes sense to use a higher value to have more “wiggle room”.
Some things to note:
- Higher max_tokens increases response time.
-
max_tokens
+ size ofprompt
in tokens shouldn’t exceed the engine context. - The completion can be combined with existing prompt to send a new request when using small
max_tokens
values to get the full completion.
Thank you for pointing that out!
In the documentation there is a notation that 1000 tokens is about 750 words. With that in mind, I bumped up the max tokens to just under 4000. However, when I prompt the davinci 002 to respond to "write 1500 words about “XYZ” it only produces about 200 words at most. Can I provide a prompt to help improve my result? I’m trying to test out the Davinci engine to see if it can indeed produce meaningful content from open ended prompts. So far, the results look promising, but the output is limited to only a few sentences.
Any suggestions are appreciated.
Having the same issue, did you figure out how to fix it? Thanks
DaVinci instruct, (the default choice) is fine-tuned to stop printing at the completion of an idea. You will need to either do multiple completions, or use the original version of DaVinci. Keep in mind that the original DaVinci, and DaVinci instruct, are quite different.
Same issue here, i am getting only 200 words approximately and the responce is cutoff half sentence. I cannot figure out how to increase max tokens with llama index, is the problem.
what would be the behavior if i dont pass the max tokens value