Can I set max_tokens for chatgpt turbo?

jayfrdm · March 2, 2023, 8:31am

Hi All,

So excited about this. But I’m getting massive reponses, and I cannot handle them. Is there a way to limit the response to a max tokens/words?

Zima · March 2, 2023, 8:36am

as per OpenAI API
max_tokens

integer

Optional

Defaults to inf

The maximum number of tokens allowed for the generated answer. By default, the number of tokens the model can return will be (4096 - prompt tokens).

jayfrdm · March 2, 2023, 8:37am

Thanks for the quick reply.
This is my promp:

    generated_response = openai.ChatCompletion.create(
        model=model_engine,
        messages=prompt_messages,
        max_tokens=1024
    )

The interaction still costed me over 1300 tokens.

why?

Zima · March 2, 2023, 8:39am

How many input tokens did you use in prompt_messages?

jayfrdm · March 2, 2023, 8:40am

“completion_tokens”: 589,
“prompt_tokens”: 1011,
“total_tokens”: 1600

Zima · March 2, 2023, 8:42am

There you have it… prompt_tokens are also billed and max_tokens is for generated answer only - completion_tokens

jayfrdm · March 2, 2023, 8:51am

But i don’t wasnt the reponse trucated. I just want openai to konw to work within those limits. Is that possible?

Zima · March 2, 2023, 9:03am

You want the total_tokens to be under 1024?
I guess you can count the prompt tokens and subtract from 1024 to get the available tokens for the completion.

jayfrdm · March 2, 2023, 3:09pm

No, I want the total reponse to be under 1600 characters due to my bot limitations. How do you suggest I work around this?

raymonddavey · March 2, 2023, 5:29pm

set max_tokens to 1600

But you will also be billed for the prompt as well. This is on top of the tokens used for the output.

jayfrdm · March 2, 2023, 8:04pm

I don’t mind the billing, I just candle handle a response that has more than 1600 characters (letters/spaces). Is there a way to tell openai to consider that and not generate responses longer then that?

georgei · March 2, 2023, 8:56pm

I doubt that it can count characters and also give you a good response.
If your response is in English, then you can limit somewhat.

Given that the average number of characters in an English word is 4, then you need to have max 400 words to fit in 1600 characters.
OpenAI documentation says that the ratio is 3:4 tokens to words.
It means that the response can be at most 300 tokens.
For safety, I would target a response of 250 tokens, because you might encounter longer words.

Note that the max_tokens parameter includes the prompt as well. So you have to find out how many tokens are in the prompt and add 250 tokens to determine the max_tokens parameter.

You can find the GPT-2 library to count the tokens on GitHub.

PS: the average word length is not 4, but you can find it.

jayfrdm · March 2, 2023, 8:57pm

So if I put in max tokens, openai will try to generate a response that will fit with those tokens?

georgei · March 2, 2023, 8:59pm

Yes, but is not a guarantee.
Sometimes there aren’t many tokens left.

anon10827405 · March 2, 2023, 9:10pm

I may be incorrect but I believe that the language model itself does not take in consideration the max tokens allowance.

It will write to whatever it desires (along with any instructions). If it hits the length, it will just cut suddenly (which you can counter by catching the stop reason and sending the prompt again). If the length is set to max, it doesn’t try and fill it to the max.

Although, if it’s set to max, you run a higher chance of getting it filled with garbage/noise

jayfrdm · March 2, 2023, 9:22pm

This is what I’ve found. It just truncates. I wish there was a way to accomplish this. I’m sure I’m not the only one with this requirement. Anyone using an sms service might hit that limit of 1600 characters.

Perhaps I should put it in the prompt in plain lang?

ruby_coder · March 2, 2023, 11:25pm

That is my experience as well.

max_tokens causes a STOP based on reason “length”.

jayfrdm · March 3, 2023, 2:47pm

Is there a way to request a feature in the openai community?

mcheich · March 13, 2023, 3:22pm

Running into the same issue.

I am developing for a very small display (size of wrist watch), and I want the responses to be terse. I thought max_tokens would inform the model to formulate a response given so many tokens, but like people have said here - it just truncates the response.

I am exploring some prompt engineering, like, “Please respond briefly”, but I’d much rather have a way to limit the response…

Maybe what I’ll do is if the response exceeds the limit I want, I send back the response with the prompt, “say that more tersely”… and try that a couple times before delivering a truncated response, which is a an undesired user experience.

Levatron · March 14, 2023, 11:08am

Perhaps you could limit the answer by injecting a system message, asking gpt to answer any question with no more than (your required character count).

Topic		Replies	Views
Max_tokens seems to do nothing for me 3.5 Turbo API	14	3393	December 18, 2023
Creating Concise AI Replies in Short Interactions without max_tokens Prompting prompt , prompt-engineering , api-output-length	10	2231	March 12, 2024
MAX TOKENS is 4,096 tokens for gpt-3.5-turbo should fit the the messages sent and the answer generated? API api	10	6170	December 18, 2023
Struggling with max_tokens and getting responses within a given limit, please help! API chatgpt	5	20539	October 28, 2023
Setting max tokens for output issues API gpt-4 , api	4	3951	January 26, 2024

Can I set max_tokens for chatgpt turbo?

Related topics