Can we block certain keywords in output?

Hello,

Just wondering if we can block certain keywords from the output? For example I do not want “Politics” or “War” in any of the output

1 Like

You could use very simple code to either remove any blocked words or resend the request until it doesn’t contain the blocked word. Another option would be to train a gpt model to decide if the output is ok for your application somewhat like the content filter.

I’m also interested if anyone has a solution to this. It’s not trivial.

Check logit_bias property. It affects the likelihood of specified tokens from appearing in the completion.

...
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  logit_bias: {11759:-100, 4208:-100}
...

where 11759, 4208 are the token ids of the words "politics" and "war", 
and -100 means to ban.

Check this tokenizer for the token ids of the words.

1 Like

In addition to @supershaneski 's excellent response, be aware that blocking a few single words is only a bandaid to prevent the model from talking about unwanted topics.

A more robust approach is to send the user’s request for a initial evaluation to another instance of the model and, if necessary, reject the question. The same can be done with the output before returning the reply to the user.
But you need to make your own decision how many extra layers of security you want to add.

The AI is also ingenious about working around limitations. Block the token " war" and the AI will not be dissuaded. It will write a separate space and the word. Write it in capital letters. Use it on the start of a new line without a leading space. Lots of other synonyms. Or whatever it takes to get the point across…

image