How to make gpt-35-turbo model follows my instructions?

Hi,

Is a way to make sure the model 100% follow my instructions? I ask the gpt-35-turbo model to write a section with 250 to 300 words. Sometimes, it will do. But sometimes, it will not and only provide a section with 20 words. How to prevent this?

The first thing to note is that these models aren’t good at counting so they have no clue how many words they’ve generated. One question though, how many tokens are you asking for? The more tokens you ask for the longer the output generally is.

One possible solution is to count the number of words and if it’s too short, call the model back and ask for a longer story:

I ask for 250 to 300 words, not use token, since token is not easy to understand. Then the model only returns 20 words, much less than I ask for.

The model doesn’t even know how many words it has produced.
It produces one token at a time.
Each time it produces the next token, it believes the previous token was “history.”
The model also doesn’t really “understand” words. It works in tokens. It sometimes, kinda-sorta, will be able to separate words, but that’s not a built-in, and a capability subject to stochastic success.

You might get a better result if you explicitly establish a boundary. Something like:

“{question} An answer to the above question that extends for three paragraphs starts here:”

Thank you. Is What you mean of a boundary a visual item, such as paragraph in your sample?

When you call the model you have to pass it a max_tokens parameter. That’s what I’m referring to. The default is 256 tokens. You should set this to 1000 or more if you’re asking for 250 - 300 words

count the number of words and if it’s too short (or too long), call the model back

This is the only answer that will guarantee the result you want.

You can try asking it to count how many words it has written:

Write a story using 250 to 300 words. Count how many words you've written. Output a JSON result matching the following:
{
  "story": "<your story>",
  "length": <word count>
}

Hi, [stevenic]

Since this is part of a conversation, I save all the conversation history and set max toaken to a larger limit 8192, but that will easily cause “too many tokens” exception. So finally I have to remove the max token setting sent to the model.

Great idea. I will try. Than k you very much.

Thank you steven. Your solution is very nice. Also system does not allow me to input a short reply. So I have to input more.

I mean that literally the last sentence of the last user prompt in your call should sound like that, so the model would then fill it in and expect to be writing three paragraphs, as the model just “continues” whatever was before it.

Although I agree with the others, the only way to guarantee a word count, is to count words that come back, and call the model, asking it to expand or contract the result. These models are not high-order instruction followers like a writing human would be.

Asking the model to count the number of words it’s written isn’t going to be very useful, because the model cannot backtrack. Once it’s decided to output the “I’ve written X words” completion, if X is too small, there’s nothing the model can do to fix it.

Regarding the “max tokens” setting, what you should do is aim for a total maximum budget (say, 8000) and then calculate the amount of tokens in your prompt, and then set max_tokens to your aim minus the amount of tokens in the prompt. That way, you will never get “too many tokens requested” errors. It’s reasonable to set the budget lower than the absolute max, because the OpenAI side may adjust your API call a bit which may slightly change how many tokens are available, so a little margin for error helps with robustness.

that will easily cause “too many tokens” exception. So finally I have to remove the max token setting sent to the model.

If you send nothing for max token length, the API will fill in the default max token limit, which is probably far less than what you want.

To actually request the maximum possible completion tokens, you will need to use a cl100k_base encoder (I recommend the tiktoken library) to get an estimated count of the tokens you’re sending and then subtract that from the model’s maximum. You can then either pad that number (because your count will not exactly match OpenAI’s count) or you can handle the inevitable token limit exceeded responses, which will tell you how how many tokens it counted and then you can subtract that from the model max and try again.

1 Like

Try it, you’ll be amazed how well it works. The reason it works is because of attention, but that gets into a lot of theory around LLMs. But regardless of the theory, take the prompt I suggested above and drop it into gpt-3.5-turbo. It will consistently give a result much closer to the word count requested than it will if you don’t ask it to count the words.

the model cannot backtrack

While it is true that it can’t go back and change what it previously wrote, anticipating what it will need to write in the future does change its predictions in the present. That is one reason we often end up resorting to making multiple calls instead of performing multiple operations in a single call. Giving it a heads up about the future operation changes what it decides to do for the present operation, sometimes for the worse.

On the same prompt?

In my experience, if you tell it “write section bout flowers” with very little details, it might pop out 20 or 200 words. So, give it more details. I’ve noticed this a lot with fiction writing.

Do you have an example of system/user/assistant prompt(s) you’re using?

OpenAI models are able to reflect on their own output inside a single prompt.
Which is only logical because every token produced is already part of the context used to create the next token.
If we provide measures and clear boundaries to help the model reflect on it’s own work you can prompt it to create texts with exactly the length you have been looking for.

But you will have to perform some additional cleaning via scripts to get the text you want.

Ps. I am saying OpenAI models because I wasn’t able to replicate this behavior with Claude 2 or Gemini Advanced, but in general it should be possible.

1 Like

I don’t doubt that asking for counting words will work better than not asking for it!
Attention has the problem that there’s only so many attention heads, and once it runs out, it can’t pay attention to more things, and “paying attention to each word” seems like a waste of scarce resources.
Also, for any kind of more complex request, the model already fails to pay attention to all constraints, so using a bunch of resources to also count words could only make that worse.

But, yes! Prompt iteration until you get something that’s good enough, is the way forward, and trying all kinds of hints is the only way to know what works in a particular case!

Hi, hemp,

What I do as follows:

  1. Get the word count of my prompt.
  2. Based on https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them, estimate the token count as 127.
  3. Then calculate the max token as 8192 - 127 - 100, the final 100 is an extra space since 127 is an estimation.
  4. Then send prompt to gpt-4 model.
  5. It finally returns 607 words, about 809 tokens.

Is my process correct? And is the result expected?

Thank you very much

Hi,

I try as follows:

  1. I save all conversations.
  2. I send the following prompt to gpt-4 model:

You are a professional writer specializing in case studies. Please write a case study with more than 600 words for our product ### in English, use xxx as the client. Please include a eye-catching title for it. Do NOT copy contents directly from Internet. Format the text with HTML tags, without the <html>, <head> and <body> tags. Use <h1> for the main title, and <h2> to <h6> tags for all other subtitles. Make sure to include the section number in the subtitles. Do NOT include "Section" text in the section titles. Use digits in section numbers.

Then if the response does not reach 600 words, I will add the following prompt on the above one:

The word count in the previous response is less than 600. Please add more words so that it has more than 600 words in total.

Then the model still produces a response < 600 words. And the following is the log:

Word count is only 552. Not reach the min limit 600. Retry 1 times.
Word count is only 579. Not reach the min limit 600. Retry 2 times.
Word count is only 551. Not reach the min limit 600. Retry 3 times.
Word count is only 549. Not reach the min limit 600. Retry 4 times.
Word count is only 551. Not reach the min limit 600. Retry 5 times.

Only after 6 retries, it finally reach the limit.

Am I doing something wrong?