Can I set max_tokens for chatgpt turbo?

“completion_tokens”: 589,
“prompt_tokens”: 1011,
“total_tokens”: 1600

There you have it… prompt_tokens are also billed and max_tokens is for generated answer only - completion_tokens

1 Like

But i don’t wasnt the reponse trucated. I just want openai to konw to work within those limits. Is that possible?

You want the total_tokens to be under 1024?
I guess you can count the prompt tokens and subtract from 1024 to get the available tokens for the completion.

No, I want the total reponse to be under 1600 characters due to my bot limitations. How do you suggest I work around this?

set max_tokens to 1600

But you will also be billed for the prompt as well. This is on top of the tokens used for the output.

I don’t mind the billing, I just candle handle a response that has more than 1600 characters (letters/spaces). Is there a way to tell openai to consider that and not generate responses longer then that?

I doubt that it can count characters and also give you a good response.
If your response is in English, then you can limit somewhat.

Given that the average number of characters in an English word is 4, then you need to have max 400 words to fit in 1600 characters.
OpenAI documentation says that the ratio is 3:4 tokens to words.
It means that the response can be at most 300 tokens.
For safety, I would target a response of 250 tokens, because you might encounter longer words.

Note that the max_tokens parameter includes the prompt as well. So you have to find out how many tokens are in the prompt and add 250 tokens to determine the max_tokens parameter.

You can find the GPT-2 library to count the tokens on GitHub.

PS: the average word length is not 4, but you can find it.

So if I put in max tokens, openai will try to generate a response that will fit with those tokens?

Yes, but is not a guarantee.
Sometimes there aren’t many tokens left.

I may be incorrect but I believe that the language model itself does not take in consideration the max tokens allowance.

It will write to whatever it desires (along with any instructions). If it hits the length, it will just cut suddenly (which you can counter by catching the stop reason and sending the prompt again). If the length is set to max, it doesn’t try and fill it to the max.

Although, if it’s set to max, you run a higher chance of getting it filled with garbage/noise

1 Like

This is what I’ve found. It just truncates. I wish there was a way to accomplish this. I’m sure I’m not the only one with this requirement. Anyone using an sms service might hit that limit of 1600 characters.

Perhaps I should put it in the prompt in plain lang?

1 Like

That is my experience as well.

max_tokens causes a STOP based on reason “length”.

:slight_smile:

Is there a way to request a feature in the openai community?

1 Like

Running into the same issue.

I am developing for a very small display (size of wrist watch), and I want the responses to be terse. I thought max_tokens would inform the model to formulate a response given so many tokens, but like people have said here - it just truncates the response.

I am exploring some prompt engineering, like, “Please respond briefly”, but I’d much rather have a way to limit the response…

Maybe what I’ll do is if the response exceeds the limit I want, I send back the response with the prompt, “say that more tersely”… and try that a couple times before delivering a truncated response, which is a an undesired user experience.

Perhaps you could limit the answer by injecting a system message, asking gpt to answer any question with no more than (your required character count).

1 Like

Hi @Levatron

I have tried that before and it has never worked because the models cannot count chars in their own completions.

So, tried it again, just now with the same results as the last time I checked. Here I inject a system message to limit the chars, but it does not work. I have tried counting chars, words, etc. before and it has never worked as required. The models just do not count. This is been discussed at length here before in our community, BTW.

system: Do not reply with more than 20 characters.

The reason the completion was cut-off was that I set max_tokens to 100, FYI.

:slight_smile:

1 Like

Hi Ruby!

Yeah, generating code its like an impossible task with those limitations. I’ve only tried it for common questions like:

{“role”: “system”, “content”: “Explain your answer within 150 characters.”}
{“role”: “user”, “How old does a monkey get?”},

temperature=0.9,
max_tokens=500,
top_p=0.1,
frequency_penalty=0.2,
presence_penalty=0.0,

Which seemed to work fine. :slightly_smiling_face:

3 Likes

That is good to know @Levatron , thanks!

Since I only work with code on a daily basic, that is cool to know.

:slight_smile: