How do I use logit bias to ban phrases and not just individual tokens?

I am trying to ban the word ‘assume’, in my generations. I am trying to use logit bias parameter to accomplish this but the word ‘assume’ as per the OpenAI tokenizer site takes up two tokens. So do I ban both of these tokens?

This is what my current code looks like:

def get_completion_from_messages(messages, model=“gpt-3.5-turbo”, temperature=0, max_tokens=100):
response = openai.ChatCompletion.create(
logit_bias={“562”: -100, “2454”:-100}

This however does not seem to work.

Any help would be appreciated!

Hi there! Due to the way tokens work, it’s probably best to try to suppress “ assume”, including the preceding space (token ID 7048). This won’t reliably remove all mentions off assume, as it will ignore “assume” without preceding space, or “Assume” (both with and without space), but I suppose it will help a fair bit.

Hope it helps!

The site you link is the 50k tokenizer for GPT-3 models, but the engine you’d be using through ChatComplete is GPT-3.5-turbo or 4. They use a 100k dictionary with different token numbers.

" assume": 9855
“assume: 46151
" Assume”: 63297
“Assume”: 5733 + 3972 (and other variants are also compound tokens)
“presume”: 24544 + 3972 (showing that you will stifle language if you block a fragment)

1 Like