Question regarding max_tokens

florianwalther · March 26, 2022, 7:58pm

If I lower the max_tokens value in my request, does GPT-3 generate shorter but complete texts, or does it just cut off the text when max_tokens is reached?

sps · March 27, 2022, 6:17am

Hi @florianwalther

It completely depends on the prompt.

Here’s the definition of max_tokens in API Reference:

The maximum number of tokens to generate in the completion.

The token count of your prompt plus max_tokens cannot exceed the model’s context length. Most models have a context length of 2048 tokens (except for the newest models, which support 4096).

florianwalther · March 27, 2022, 8:00am

Thank you for your answer. I read the documentation about this but unfortunately, it doesn’t answer the question.
I tried it out with very small max_token values (like 20) and the text is indeed cut off. But I don’t know if that’s also the case for a larger value like 100. Do you have an idea where I could get this question answered? It’s quite important for the functionality of my app.

florianwalther · March 27, 2022, 9:05am

I asked the support and they clarified that GPT-3 will not attempt to create shorter texts with a smaller max_tokens value. The text will indeed just be cut off. So in my case, I guess it makes sense to use a higher value to have more “wiggle room”.

sps · March 27, 2022, 11:04am

Some things to note:

Higher max_tokens increases response time.
max_tokens + size of prompt in tokens shouldn’t exceed the engine context.
The completion can be combined with existing prompt to send a new request when using small max_tokens values to get the full completion.

florianwalther · March 27, 2022, 11:30am

Thank you for pointing that out!

tim.bohmann · March 29, 2022, 8:18pm

In the documentation there is a notation that 1000 tokens is about 750 words. With that in mind, I bumped up the max tokens to just under 4000. However, when I prompt the davinci 002 to respond to "write 1500 words about “XYZ” it only produces about 200 words at most. Can I provide a prompt to help improve my result? I’m trying to test out the Davinci engine to see if it can indeed produce meaningful content from open ended prompts. So far, the results look promising, but the output is limited to only a few sentences.

Any suggestions are appreciated.

dreyfus · September 4, 2022, 1:05am

Having the same issue, did you figure out how to fix it? Thanks

jhsmith12345 · September 5, 2022, 1:57am

DaVinci instruct, (the default choice) is fine-tuned to stop printing at the completion of an idea. You will need to either do multiple completions, or use the original version of DaVinci. Keep in mind that the original DaVinci, and DaVinci instruct, are quite different.

ivan2 · April 1, 2023, 9:59am

Same issue here, i am getting only 200 words approximately and the responce is cutoff half sentence. I cannot figure out how to increase max tokens with llama index, is the problem.

harjassodhi9 · November 24, 2023, 10:47am

what would be the behavior if i dont pass the max tokens value

Topic		Replies	Views
Max_tokens seems to do nothing for me 3.5 Turbo API	14	3395	December 18, 2023
Clarification for max_tokens API codex	10	107483	December 12, 2023
Struggling with max_tokens and getting responses within a given limit, please help! API chatgpt	5	20603	October 28, 2023
Setting max tokens for output issues API gpt-4 , api	4	3977	January 26, 2024
Can I set max_tokens for chatgpt turbo? API	23	28685	December 13, 2023

Question regarding max_tokens

Related topics