You want the total_tokens to be under 1024?
I guess you can count the prompt tokens and subtract from 1024 to get the available tokens for the completion.
I don’t mind the billing, I just candle handle a response that has more than 1600 characters (letters/spaces). Is there a way to tell openai to consider that and not generate responses longer then that?
I doubt that it can count characters and also give you a good response.
If your response is in English, then you can limit somewhat.
Given that the average number of characters in an English word is 4, then you need to have max 400 words to fit in 1600 characters.
OpenAI documentation says that the ratio is 3:4 tokens to words.
It means that the response can be at most 300 tokens.
For safety, I would target a response of 250 tokens, because you might encounter longer words.
Note that the max_tokens parameter includes the prompt as well. So you have to find out how many tokens are in the prompt and add 250 tokens to determine the max_tokens parameter.
You can find the GPT-2 library to count the tokens on GitHub.
PS: the average word length is not 4, but you can find it.
I may be incorrect but I believe that the language model itself does not take in consideration the max tokens allowance.
It will write to whatever it desires (along with any instructions). If it hits the length, it will just cut suddenly (which you can counter by catching the stop reason and sending the prompt again). If the length is set to max, it doesn’t try and fill it to the max.
Although, if it’s set to max, you run a higher chance of getting it filled with garbage/noise
This is what I’ve found. It just truncates. I wish there was a way to accomplish this. I’m sure I’m not the only one with this requirement. Anyone using an sms service might hit that limit of 1600 characters.
Perhaps I should put it in the prompt in plain lang?
I am developing for a very small display (size of wrist watch), and I want the responses to be terse. I thought max_tokens would inform the model to formulate a response given so many tokens, but like people have said here - it just truncates the response.
I am exploring some prompt engineering, like, “Please respond briefly”, but I’d much rather have a way to limit the response…
Maybe what I’ll do is if the response exceeds the limit I want, I send back the response with the prompt, “say that more tersely”… and try that a couple times before delivering a truncated response, which is a an undesired user experience.
I have tried that before and it has never worked because the models cannot count chars in their own completions.
So, tried it again, just now with the same results as the last time I checked. Here I inject a system message to limit the chars, but it does not work. I have tried counting chars, words, etc. before and it has never worked as required. The models just do not count. This is been discussed at length here before in our community, BTW.
system: Do not reply with more than 20 characters.