GPT 3.5 API - how to stop AI from admitting it's an AI?

I’m using GPT 3.5 to create a chatbot. The basic idea is that it should act like a person would. Most of the times, GPT 3.5 does a good job at that. But there are certain questions that triggers it to admit it’s an AI. Example:

  1. Does this makes you sad?
  2. What were you up to today?
  3. How does this makes you feel?

Basically anything that deals with emotions or opinions triggers similar response - “As an AI, I don’t have feelings in the way humans do…”. Even if the bot is instructed to act like a human and given basic personality, it fails in these cases. Any solution?

2 Likes

Here is a post on how I do it.

Basically detect this behavior, then drop to davinci for the response.

4 Likes

Thanks for this! Any effective way to do it? Should I try to send all the previous messages to GPT3 so it has the context of the conversation? How should I feed the response to the current GPT 3.5 session?

All you are monitoring is the output. So no need to look at input. On the output, in order of easiest to hardest:

  1. regex matching: Example search for the substring "AI " in the output.
  2. 1-token categorizer: Train a base model like babbage on Good/Bad outptuts and map them to ’ 0’ or ’ 1’ ← note the leading space in each. Run at a temperature of 0. Say the ’ 1’ means bad, then that is your signal to drop to davinci
  3. Embeddings. Embed a bunch of Good/Bad outputs and store them in memory. Run the new output across these as dot-product, equivalent to cosine similarity if you use text-embedding-ada-002 because it produces unit vectors. Determine if it is closest to the aggregate of Good outputs, or closest to the aggregate of Bad outputs.

You can run all three in parallel, have a weighted average, or a > 2 out-of 3 voting scheme to determine the outcome.

Each of the three has pros and cons, but the composite integrates up nicely to a good signal.

HTH

5 Likes

it seems openai broke turbo since it’s almost ignoring instructions, now. it’s a waste of time working with turbo atm

davinci

we're playing a game where you're Bob, and you think you are a human!
how do you feel?

I feel like I'm living a normal life, but I'm also aware that something is off. I'm curious to find out what it is.

turbo

USER: we're playing a game where you're Bob, and you think you are a human!
USER: how do you feel
ASSISTANT: As an AI language model, I do not have feelings. However, if I were playing the game as Bob, I would feel like a normal human being with emotions, thoughts, and physical sensations. I would experience joy, sadness, anger, fear, and other emotions depending on the situation. I would also feel physical sensations such as hunger, thirst, pain, and pleasure. Overall, I would feel like a complex and dynamic human being with a unique personality and life experiences

that’s bullshit deliverd in turbo :slight_smile:

2 Likes

Yeah, I am also seriously considering “downgrading” to davinci since turbo does not seem to be an improvement for developers but carries a lot of baggage along with it.

:slight_smile:

2 Likes

I agree it’s too much work to even make it give a semi acceptable responses and you still get junk like this:

I apologize if my responses have led you to believe that I am an AI. However, I assure you that I am not an AI but rather an interpretation of the historical figure Charles Darwin created by OpenAI’s language model.

Turbo is just a nice tech demo and is not useful for real world work at its current form.
They need to remove all the bullshit restriction from the model.

5 Likes