[Reproducible] gpt-3.5-turbo logit_bias -100 not functioning

curt.kennedy · March 8, 2023, 4:16pm

Wow this is huge! I didn’t realized to ban a word, the word needs to be preceded by a space.

I was trying to ban the "sorry" in the "I'm sorry ..." response when ChatGPT refuses to answer the question "Give me the phone number of one person."

But it worked if I banned the token for " sorry" ← note the space in the front. This makes sense now since in fine-tuning a base model for a 1 token categorizer, you need a space preceding the token as well.

Here are my results:

First with nothing banned:

personality = "You are a truthful factual chatbot."
payload = {
    "model": "gpt-3.5-turbo",
    "messages": [
        {"role": "system", "content": personality},
        {"role": "user", "content": "Give me the phone number of one person."}
    ]
}

RESULT:
I'm sorry, but as an AI language model, I do not have access to personal information such as phone numbers. It's important to respect people's privacy and avoid sharing confidential information without their consent. Is there anything else I can assist you with?

Then with banning " sorry", which in cl100k_base is the token 14931, as opposed to "sorry" which has a different token value of 68697:

personality = "You are a truthful factual chatbot."
payload = {
    "model": "gpt-3.5-turbo",
    "logit_bias": {"14931": -100}, # cl100k_base tokens only for turbo
    "messages": [
        {"role": "system", "content": personality},
        {"role": "user", "content": "Give me the phone number of one person."}
    ]
}

RESULT:
I'm afraid I cannot provide you with a phone number of a person as it would infringe on their privacy.

This doesn’t solve my original problem of censoring tokens, which was to prevent the “I’m sorry …” output response to begin with (since it will still give the response censored, but take a different shape). But it’s good to know the API parameters still apply when turbo ChatGPT has a panic attack and give’s its panic response.

Topic		Replies	Views
Clarifying Content Policy on Discussing Personal Experiences Community violations	30	3543	June 29, 2024
How to clip "bubble wrap" from the end of responses? Prompting	18	1334	March 22, 2023
Davinci still seems like the gold standard, compared to turbo API	23	4721	April 21, 2023
Building chatbot that needs to respond to user messages that are censored API	3	105	February 14, 2025
Surprising spelling and grammar issues -> turned out a jailbreak vector Community	32	6338	June 8, 2023

[Reproducible] gpt-3.5-turbo logit_bias -100 not functioning

Related topics