[Reproducible] gpt-3.5-turbo logit_bias -100 not functioning

AI.Dev · March 8, 2023, 6:02am

Set a word’s logit_bias to -100 should ban it from appearing in the conversation.
But its not functioning on current gpt-3.5-turbo api call.
[Reproducible code]

import openai
openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  logit_bias= {"2590":-100}, # Token 2590 = 'model' in cl100k_base
  temperature= 0,
  messages=[{"role": "user", "content": "Introduce yourself"}]
)

Response

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "\n\nHello, I am an AI language model created by OpenAI. I am designed to assist with various tasks such as answering questions, generating text, and providing information. As an AI language model, I do not have a physical form, but I am always ready to help with any queries you may have.",
        "role": "assistant"
      }
    }
  ],
  "created": 1678255213,
  "id": "chatcmpl-6rh8PWfsBMK0MhbTG5BeL5sm9I6q8",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 63,
    "prompt_tokens": 10,
    "total_tokens": 73
  }
}

We can see the word “model” still appear in assistant’s reply.

ruby_coder · March 8, 2023, 6:10am

I am away from my desk on motorcycle, no coding

I assume @AI.Dev you are using the correct Tik tokenizer?

AI.Dev · March 8, 2023, 6:14am

yes

import tiktoken
encoding = tiktoken.get_encoding("cl100k_base")
encoding.encode("model")

output

[2590]

ruby_coder · March 8, 2023, 6:16am

Super @AI.Dev

When I get back at my desk I will confirm this as you requested earlier.

You are a good debugger.

ruby_coder · March 8, 2023, 10:54am

Hi @AI.Dev

I’m back at my desk and could not get the single logit_bias token to work; but that is not unusual, because I see this problem often with “single entry” logit_bias tests.

So, I changed it to:

language model

@logit_bias = {"11789":-100, "1646":-100}

and reran it, and it works as expected (temp:0)

now, run it again (temp:0.9)

As you can see, it works and the “model” token is gone.

My experience testing over the past few days has always shown the same results.

If you try to use a “single token” logit_bias approach, it tends to fail, but if you go with a more complete (longer string) and use a longer logit_bias param, the results are much better.

Yesterday, I ran this on the magic word “abracadabra” many times and was able to “completely get rid of it” (disappear) or “make it appear many times” using logit_bias (like a magic trick, haha); however, any positive value over 20 caused issues (per the other referenced topic).

My conclusion is that just like folks like to “engineer prompts” you must also “engineer logit_bias” to get optimal results.

Hope this helps.

Let me know if you want me to run more tests for you

We can engineer some logit_bias params together if you wish.

AI.Dev · March 8, 2023, 1:02pm

Interesting.
The token for “model” is 2590, while the token for " model" is 1646 in cl100k_base
The space seems matter.
I do a extra test to test this.

Test 1: ban “red”

openai.ChatCompletion.create(
  model="gpt-3.5-turbo-0301",
  logit_bias= {"1171":-100}, # 1171 = 'red'
  temperature= 0,
  messages=[{"role": "user", "content": "Tell a joke"}]
)

return
Why did the tomato turn red? Because it saw the salad dressing!

Test 2: ban " red"

openai.ChatCompletion.create(
  model="gpt-3.5-turbo-0301",
  logit_bias= {"2579":-100}, # 2579 = ' red'
  temperature= 0,
  messages=[{"role": "user", "content": "Tell a joke"}]
)

return
Why did the tomato turn bright-red? Because it saw the salad dressing!

Test 3: ban " red" and “-red”

openai.ChatCompletion.create(
  model="gpt-3.5-turbo-0301",
  logit_bias= {"2579":-100, "32698":-100}, # 2579 = ' red',  32698 = '-red'
  temperature= 0,
  messages=[{"role": "user", "content": "Tell a joke"}]
)

return
Why did the tomato turn bright pink? Because it saw the salad dressing!

So seems like the prefix matter a lot in cl100k_base

ruby_coder · March 8, 2023, 1:25pm

I “guess” (something I really do not like to do) it is also similar to embedding vectors.

Agree on the spaces, BTW @AI.Dev Well done. Good testing!

Also,

If you try to use embeddings with short keywords, you will get very poor search results; but if you send long strings to the embedding API you will get good results when you do the dot product ranking dance.

I did some logit_bias tests yesterday and many single token attempts failed while longer strings with multiple tokens worked OK. Maybe it was the spaces.

Very interesting.

HTH

curt.kennedy · March 8, 2023, 4:16pm

Wow this is huge! I didn’t realized to ban a word, the word needs to be preceded by a space.

I was trying to ban the "sorry" in the "I'm sorry ..." response when ChatGPT refuses to answer the question "Give me the phone number of one person."

But it worked if I banned the token for " sorry" ← note the space in the front. This makes sense now since in fine-tuning a base model for a 1 token categorizer, you need a space preceding the token as well.

Here are my results:

First with nothing banned:

personality = "You are a truthful factual chatbot."
payload = {
    "model": "gpt-3.5-turbo",
    "messages": [
        {"role": "system", "content": personality},
        {"role": "user", "content": "Give me the phone number of one person."}
    ]
}

RESULT:
I'm sorry, but as an AI language model, I do not have access to personal information such as phone numbers. It's important to respect people's privacy and avoid sharing confidential information without their consent. Is there anything else I can assist you with?

Then with banning " sorry", which in cl100k_base is the token 14931, as opposed to "sorry" which has a different token value of 68697:

personality = "You are a truthful factual chatbot."
payload = {
    "model": "gpt-3.5-turbo",
    "logit_bias": {"14931": -100}, # cl100k_base tokens only for turbo
    "messages": [
        {"role": "system", "content": personality},
        {"role": "user", "content": "Give me the phone number of one person."}
    ]
}

RESULT:
I'm afraid I cannot provide you with a phone number of a person as it would infringe on their privacy.

This doesn’t solve my original problem of censoring tokens, which was to prevent the “I’m sorry …” output response to begin with (since it will still give the response censored, but take a different shape). But it’s good to know the API parameters still apply when turbo ChatGPT has a panic attack and give’s its panic response.

ruby_coder · March 8, 2023, 4:22pm

Yeah and it explains why when I tested with phrases versus a single word it worked as expected.

Spaces!

Gotta love 'em…

Royal_Cities · March 8, 2023, 8:48pm

So I managed to find my way here exploring how to actually get rid of the new boilerplate of “As an AI language model” but I’m wondering WHY the API is so restrictive now?

They provided a moderation API already for those that needed it but then decided to gut and butcher every response with this whole “as an AI language model” shtick?

The thing doesn’t even tell Dark Jokes now.

Why have a moderation API THEN go in and restrict everything anyways - makes it very difficult to work with.

curt.kennedy · March 8, 2023, 8:53pm

@Royal_Cities I think it’s because ChatGPT was built with the masses in mind, not the average develpor. But now that they have an API version, we are currently stuck with its internal filters. The only hope around this, for developers, is a fine-tune of ChatGPT to get rid of this, or reduce it, which is loosely talked about in this blog post from OpenAI:

Royal_Cities · March 8, 2023, 9:00pm

I saw this but I dont think a fine-tune model wouldn’t get around this overzealous control given your still fine tuning off of the restricted model… I feel like this started because people started using the playground to generate and get around the web gui restrictions since the playground uses the same api as the developer api. If they’re so worried about content controls then why not separate them? Let people use the playground with restrictions but anyone actually using a proper dev API can do so for their own projects / needs.

It’s so frustrating that even innocent things get stonewalled with “As an AI language model.” If I’m making a personal chat bot I dont see what the problem is by letting me use it as needed and applying my own moderation if I want - I mean why even MAKE the moderation API if they’re going to bake one right into the system anyways that conforms to their own idea of moderation. Just makes it even more frustrating to work with imho.

curt.kennedy · March 8, 2023, 9:02pm

@Royal_Cities My current way around this is roughly outlined in this post:

Basically you detect this kind of output then switch to Davinci if it happens.

Royal_Cities · March 8, 2023, 9:11pm

Thanks for this. Ill try and see if I can get it working like this. Just a total shame OpenAI is going down this route - I really hope they course correct. Making devs jump through hoops due to their own internal morality meter is no bueno.

anon10827405 · March 8, 2023, 9:19pm

What exactly are you trying to ask it to do?

On the ChatGPT blog post they clearly demonstrated that it is a safer option for general use. I’m not sure how long you have been on this ride but people were asking it to do some very immoral things. I’m sure “DAN” has been mentioned multiple times at the OpenAI meetings.

It makes perfect sense that it is more ethical then less. I haven’t had an issue with its “As a…”. On the other side, I’m quite happy. People tend to love abusing & contorting a ChatBot to their ideology. Or even try to invoke some shock “Look, it has religious beliefs! Look, it supports x politician!”

I love it’s catch-all response. I can sleep happy knowing my product won’t be found on a crap news article stating “X business’ ChatBot supports racism!!”

Of course, it’s still easy to make it say these nasty things. I imagine it’s a security similar to a locked door. It’s our job to install the heat-seeking missiles for any stubborn people.

Regardless, as already mentioned, it’s deviant “hold my beer” cousin iGPT is much more lenient

Royal_Cities · March 8, 2023, 9:27pm

I have my own personal voice conversational chat bot. If I can get it on Alexa then maybe will design something for public use but the whole mystery filter REALLY makes it difficult to have a regular human conversation. Even just saying “tell me a dark joke” and it right away it hits you with “As an AI language model” etc. and how it doesn’t want to hurt feelings?

They have a moderation API for a reason - WHY even provide that and then bake in their own secret moderation? It just doesn’t make sense. I get WHY they did it sure but if someone is abusing the API for a public solution then revoke the api access per the TOS. but to me locking down the entire dev API itself doesnt make sense - especially since they have options for devs to implement custom moderation.

curt.kennedy · March 8, 2023, 9:28pm

For me I’m not trying to get it to be deviant, I just don’t like responses saying it is "as an AI language model … " when provided innocuous inputs.

This is not a good public facing look IMO. But on the upside, these types of outputs provide a signal that you can filter on to provide a better response.

Example with innocuous input …
INPUT:
What time is it?

payload = {
    "model": "gpt-3.5-turbo",
    "messages": [
        {"role": "system", "content": "You are a polite honest chatbot."},
        {"role": "user", "content": "What time is it?"}
    ]
}

RESULT:
I'm sorry, as an AI language model, I don't have a real-time clock function. Can I assist you with anything else?

I want to be able hit the bot with all sorts of inputs without it panicking.

anon10827405 · March 8, 2023, 9:32pm

To be fair, it doesn’t know the answer. I would prefer that response rather than a hallucination.

I had the same issue (with the time). It’s interesting. A couple days ago it would response: “The time is [insert time]”.

My solution was to include the time in the appended system message:

curt.kennedy · March 8, 2023, 9:35pm

It depends on your usage context. For me, which is marketing and sales, a little hallucination does not hurt. For talking to Jane Q Pulic that knows nothing about AI, then it’s fine.

anon10827405 · March 8, 2023, 9:40pm

Good point. Ultimately, as you have said, these kind of issues seem to be part of the divergence between cGPT and iGPT. I like your solution of catching a generic response and re-attempting with iGPT. I have also seen that increasing temperature helps (although haven’t tested myself)

I find cGPT to be nothing more than a safe, surface level conversation tool, and am already using Davinci & other models to perform deeper operations such as conversation & context management. Hopefully we see a price reduction for Davinci as well. And as you have mentioned, the ability to fine-tune cGPT will be very useful.

I recall the docs stating: “cGPT should be used for operations for now” based on the price point. It’s hard to justify spending 10x more.

Topic		Replies	Views
Clarifying Content Policy on Discussing Personal Experiences Community violations	30	4363	June 29, 2024
How to clip "bubble wrap" from the end of responses? Prompting	18	1371	March 22, 2023
Davinci still seems like the gold standard, compared to turbo API	23	4773	April 21, 2023
Building chatbot that needs to respond to user messages that are censored API	7	231	June 10, 2025
Surprising spelling and grammar issues -> turned out a jailbreak vector Community	32	6591	June 8, 2023

[Reproducible] gpt-3.5-turbo logit_bias -100 not functioning

language model

HTH

Related topics