Max_tokens seems to do nothing for me 3.5 Turbo

A_A · August 15, 2023, 7:46pm

Max_tokens seems to do nothing for me whatsoever. I want to limit the response length I’m getting to about 150 tokens.

If I set max_tokens to prompt + 150, it doesn’t keep to it at all. If I try to test it by doing something extreme, like setting max_tokens to 400 when my prompt alone is 850, the API just continues as usual and takes an 850 prompt + outputs a 250 response.

It’s like max_tokens has no effect on anything, really.

Here is my prompt:

completion = openai.ChatCompletion.create(model=“gpt-3.5-turbo”,
max_tokens=1000,
messages=[{
“role”: “system”,
“content”: my_prompt
}, {
“role”: “user”,
“content”: val
}])

PaulBellow · August 15, 2023, 8:19pm

Are you using a stop sequence? It can likely help you…

https://help.openai.com/en/articles/5072263-how-do-i-use-stop-sequences

A_A · August 15, 2023, 10:43pm

I’m not quite sure how to implement this in the context of the API. For example, if it’s responding with 250 tokens in 5 paragraphs, and I wanted 100 tokens in 2-3 paragraphs, what would my stop sequence look like?

PaulBellow · August 15, 2023, 10:47pm

Depends on your entire prompt. What are you trying to get it to output? Can you show it an example with a stop sequence added?

A_A · August 15, 2023, 11:04pm

I could show it an example, for example one with only 100 tokens. However how would the stop sequence be integrated such that it would do the same when the API response came through?

PaulBellow · August 15, 2023, 11:07pm

Prompt:

Give me some output about XYZ…

Sure! Here’s your information about XYZ. ###

Give me some output about XYZ…

Then set your stop sequence to “###” and it should follow the one-shot example.

It might be easier to help if you could share the prompt or what you’re trying to achieve.

Counting words/tokens is hard for the LLM…

_j · August 15, 2023, 11:20pm

You seem a bit confused. The max_tokens parameter is only the context length reservation of the response. If you set it to 250, you will have a response that is truncated at 250 tokens. If you set it to 850 or whatever incorrect idea you have, 850 tokens is the amount of tokens that can be used for the reply.

The specification doesn’t inform the AI what type of response it should craft at all, the AI doesn’t know this setting. You’ll have to instruct the AI with language like “two brief paragraphs” to shape the length of the output.

A_A · August 15, 2023, 11:50pm

_j:

A_A:

If I set max_tokens to prompt + 150, it doesn’t keep to it at all. If I try to test it by doing something extreme, like setting max_tokens to 400 when my prompt alone is 850, the API just continues as usual and takes an 850 prompt + outputs a 250 response.

You seem a bit confused. The max_tokens parameter is only the context length reservation of the response. If you set it to 250, you will have a response that is truncated at 250 tokens. If you set it to 850 or whatever incorrect idea you have, 850 tokens is the amount of tokens that can be used for the reply.

The specification doesn’t inform the AI what type of response it should craft at all, the AI doesn’t know this setting. You’ll have to instruct the AI with language like “two brief paragraphs” to shape the length of the output.

Ok thanks this worked, I thought I had tried it like that but evidently I had not. Lots of incorrect info out there about this parameter.

With regard to putting a descriptor of output structure directly within the prompt, this seems to be ineffectual as I tried it quite a few times and it didn’t make a difference.

A_A · August 15, 2023, 11:58pm

So it would look something like this:

completion = openai.ChatCompletion.create(model=“gpt-3.5-turbo”,
max_tokens=1000,
messages=[{
“role”: “system”,
“content”: You are a bot that knows x (x being some knowledge I'm pulling from a dB, which is variable depending upon which group is accessing it) Tell me about x.
}, {
“role”: “user”,
“content”: val
}])

I simply want the output here to be 150 tokens or less, and it would be preferable if it were not truncated awkwardly via forcing max tokens to be 150.

_j · August 16, 2023, 12:06am

Again, no, you set max tokens to only the length of the response desired.

And then you use prompt language that curtails even a physics textbook topic to the length desired (here 150 tokens would result in truncation):
Untitled

PaulBellow · August 16, 2023, 12:19am

Yes, this. It’s not always easy to get it to stop at X words or X tokens, though, which is why I suggested a stop word if it could be added.

amazingjoe · August 20, 2023, 5:04pm

This is where prompt engineering becomes more art than science. I would recommend providing two clearly delineated examples when asking the question which are the desired length. That should help you get the right amount of text back.

huxiaozhuan · August 21, 2023, 1:53am

Additionally, your max_tokens setting will not affect the GPT generation process. As mentioned by _j, it is truncated.

michaelhu · August 21, 2023, 2:03am

It’s not recommended to set this token since it doesn’t impact the generation or even billing. It only determines the output you receive through truncation.

Topic		Replies	Views
Can I set max_tokens for chatgpt turbo? API	23	20390	December 13, 2023
Question regarding max_tokens Prompting	11	27944	December 13, 2023
Setting max tokens for output issues API gpt-4 , api	4	881	January 26, 2024
Struggling with max_tokens and getting responses within a given limit, please help! API chatgpt	5	5286	October 28, 2023
Not allowed to have all 8192 tokens API gpt-4	16	6393	December 18, 2023

Max_tokens seems to do nothing for me 3.5 Turbo

Related Topics