Question regarding max_tokens

If I lower the max_tokens value in my request, does GPT-3 generate shorter but complete texts, or does it just cut off the text when max_tokens is reached?

1 Like

Hi @florianwalther

It completely depends on the prompt.

Here’s the definition of max_tokens in API Reference:

The maximum number of tokens to generate in the completion.

The token count of your prompt plus max_tokens cannot exceed the model’s context length. Most models have a context length of 2048 tokens (except for the newest models, which support 4096).


Thank you for your answer. I read the documentation about this but unfortunately, it doesn’t answer the question.
I tried it out with very small max_token values (like 20) and the text is indeed cut off. But I don’t know if that’s also the case for a larger value like 100. Do you have an idea where I could get this question answered? It’s quite important for the functionality of my app.


I asked the support and they clarified that GPT-3 will not attempt to create shorter texts with a smaller max_tokens value. The text will indeed just be cut off. So in my case, I guess it makes sense to use a higher value to have more “wiggle room”.


Some things to note:

  1. Higher max_tokens increases response time.
  2. max_tokens + size of prompt in tokens shouldn’t exceed the engine context.
  3. The completion can be combined with existing prompt to send a new request when using small max_tokens values to get the full completion.
1 Like

Thank you for pointing that out!

1 Like

In the documentation there is a notation that 1000 tokens is about 750 words. With that in mind, I bumped up the max tokens to just under 4000. However, when I prompt the davinci 002 to respond to "write 1500 words about “XYZ” it only produces about 200 words at most. Can I provide a prompt to help improve my result? I’m trying to test out the Davinci engine to see if it can indeed produce meaningful content from open ended prompts. So far, the results look promising, but the output is limited to only a few sentences.

Any suggestions are appreciated.

1 Like

Having the same issue, did you figure out how to fix it? Thanks

DaVinci instruct, (the default choice) is fine-tuned to stop printing at the completion of an idea. You will need to either do multiple completions, or use the original version of DaVinci. Keep in mind that the original DaVinci, and DaVinci instruct, are quite different.

Same issue here, i am getting only 200 words approximately and the responce is cutoff half sentence. I cannot figure out how to increase max tokens with llama index, is the problem.

what would be the behavior if i dont pass the max tokens value