Can I set max_tokens for chatgpt turbo?

Hi All,

So excited about this. But I’m getting massive reponses, and I cannot handle them. Is there a way to limit the response to a max tokens/words?

1 Like

as per OpenAI API
max_tokens

integer

Optional

Defaults to inf

The maximum number of tokens allowed for the generated answer. By default, the number of tokens the model can return will be (4096 - prompt tokens).

Thanks for the quick reply.
This is my promp:

    generated_response = openai.ChatCompletion.create(
        model=model_engine,
        messages=prompt_messages,
        max_tokens=1024
    )

The interaction still costed me over 1300 tokens.

why?

How many input tokens did you use in prompt_messages?

“completion_tokens”: 589,
“prompt_tokens”: 1011,
“total_tokens”: 1600

There you have it… prompt_tokens are also billed and max_tokens is for generated answer only - completion_tokens

1 Like

But i don’t wasnt the reponse trucated. I just want openai to konw to work within those limits. Is that possible?

You want the total_tokens to be under 1024?
I guess you can count the prompt tokens and subtract from 1024 to get the available tokens for the completion.

No, I want the total reponse to be under 1600 characters due to my bot limitations. How do you suggest I work around this?

set max_tokens to 1600

But you will also be billed for the prompt as well. This is on top of the tokens used for the output.

I don’t mind the billing, I just candle handle a response that has more than 1600 characters (letters/spaces). Is there a way to tell openai to consider that and not generate responses longer then that?

I doubt that it can count characters and also give you a good response.
If your response is in English, then you can limit somewhat.

Given that the average number of characters in an English word is 4, then you need to have max 400 words to fit in 1600 characters.
OpenAI documentation says that the ratio is 3:4 tokens to words.
It means that the response can be at most 300 tokens.
For safety, I would target a response of 250 tokens, because you might encounter longer words.

Note that the max_tokens parameter includes the prompt as well. So you have to find out how many tokens are in the prompt and add 250 tokens to determine the max_tokens parameter.

You can find the GPT-2 library to count the tokens on GitHub.

PS: the average word length is not 4, but you can find it.

So if I put in max tokens, openai will try to generate a response that will fit with those tokens?

Yes, but is not a guarantee.
Sometimes there aren’t many tokens left.

I may be incorrect but I believe that the language model itself does not take in consideration the max tokens allowance.

It will write to whatever it desires (along with any instructions). If it hits the length, it will just cut suddenly (which you can counter by catching the stop reason and sending the prompt again). If the length is set to max, it doesn’t try and fill it to the max.

Although, if it’s set to max, you run a higher chance of getting it filled with garbage/noise

2 Likes

This is what I’ve found. It just truncates. I wish there was a way to accomplish this. I’m sure I’m not the only one with this requirement. Anyone using an sms service might hit that limit of 1600 characters.

Perhaps I should put it in the prompt in plain lang?

1 Like

That is my experience as well.

max_tokens causes a STOP based on reason “length”.

:slight_smile:

1 Like

Is there a way to request a feature in the openai community?

1 Like

Running into the same issue.

I am developing for a very small display (size of wrist watch), and I want the responses to be terse. I thought max_tokens would inform the model to formulate a response given so many tokens, but like people have said here - it just truncates the response.

I am exploring some prompt engineering, like, “Please respond briefly”, but I’d much rather have a way to limit the response…

Maybe what I’ll do is if the response exceeds the limit I want, I send back the response with the prompt, “say that more tersely”… and try that a couple times before delivering a truncated response, which is a an undesired user experience.

Perhaps you could limit the answer by injecting a system message, asking gpt to answer any question with no more than (your required character count).

1 Like