Wow this is huge! I didn’t realized to ban a word, the word needs to be preceded by a space.
I was trying to ban the "sorry"
in the "I'm sorry ..."
response when ChatGPT refuses to answer the question "Give me the phone number of one person."
But it worked if I banned the token for " sorry"
← note the space in the front. This makes sense now since in fine-tuning a base model for a 1 token categorizer, you need a space preceding the token as well.
Here are my results:
First with nothing banned:
personality = "You are a truthful factual chatbot."
payload = {
"model": "gpt-3.5-turbo",
"messages": [
{"role": "system", "content": personality},
{"role": "user", "content": "Give me the phone number of one person."}
]
}
RESULT:
I'm sorry, but as an AI language model, I do not have access to personal information such as phone numbers. It's important to respect people's privacy and avoid sharing confidential information without their consent. Is there anything else I can assist you with?
Then with banning " sorry"
, which in cl100k_base
is the token 14931
, as opposed to "sorry"
which has a different token value of 68697
:
personality = "You are a truthful factual chatbot."
payload = {
"model": "gpt-3.5-turbo",
"logit_bias": {"14931": -100}, # cl100k_base tokens only for turbo
"messages": [
{"role": "system", "content": personality},
{"role": "user", "content": "Give me the phone number of one person."}
]
}
RESULT:
I'm afraid I cannot provide you with a phone number of a person as it would infringe on their privacy.
This doesn’t solve my original problem of censoring tokens, which was to prevent the “I’m sorry …” output response to begin with (since it will still give the response censored, but take a different shape). But it’s good to know the API parameters still apply when turbo ChatGPT has a panic attack and give’s its panic response.